Future of Nagios

Mathieu Gagné mgagne at iweb.com
Thu May 7 21:21:31 CEST 2009


Hi,

On 5/6/09 4:47 PM, Andreas Ericsson wrote:
> Well, restarting or just reloading the configuration doesn't really make
> a difference to what kind of monitoring is happening during the reload.
> Even if Nagios were to reload the configuration without requiring a
> restart, no network monitoring would happen during the reloading.
>
>> If we reload Nagios too often, it would simply pass the majority of its
>> time exporting configuration/status to NDOutils and scheduling checks
>> without doing any real work at all. Too seldom and new monitoring would
>> take too much time before being scheduled.
>>
>> Any future plan regarding this aspect?
>>
>
> Well, I've experimented a little bit. It seems to be several orders of
> magnitude faster to do the configuration parsing in two passes. One to
> find out how many objects there are of each type and sort them into a
> two-dimensional table of and then doing a binary search on that table,
> as opposed to creating fixed-sized hash tables and pre-insert objects
> into it. This is especially true for huge configurations, and appears
> to be caused by far more beneficial memory access patterns and the
> ability to only parse most objects a single time since we know that
> all hosts have been parsed by the time services are parsed, fe.


The main goal for us was to retrieve status information as fast as 
possible in a centralized way. (because we have multiple Nagios servers)

NDOutils was the solution we choose to answer our needs for the 
following reasons:

1) There's no known way (to me) to retrieve status information directly 
from the daemon. It has to be exported to a file (status.dat)
2) Parsing status.dat takes too much time (I tried with Perl and PHP)
3) Writing a CGI script to export the status in XML using Nagios 
functions isn't faster since it still relies on status.dat
4) Mounting a tmpfs folder and moving status.dat in it doesn't help


Unfortunately, the main "problem" with NDOutils is that it reexports the 
configuration and status at every reload. Clearing the "old" information 
and exporting the *exact* same information is very time consuming and no 
very efficient.

I found a patch which could improve/fix this behavior:
http://opsview-blog.opsera.com/dotorg/2007/09/nagios-patch-da.html
=> Do not resend retained status to NDO

Only problem is that deleted hosts/services would never be removed from 
MySQL if we apply the patch.


To conclude, the real problem isn't with the Nagios restart process 
itself but with:
- NDOutils inefficiency at managing retention data
- The fact we can't access status information in a fast and efficient way.

So I was hoping for some improvements regarding this aspect. (maybe by 
using IPC/shared memory or a similar solution to access the status 
information directly from the daemon memory)

--
Mathieu

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com




More information about the Developers mailing list