Future of Nagios

Andreas Ericsson ae at op5.se
Fri May 8 09:43:00 CEST 2009


Mathieu Gagné wrote:
> Hi,
> 
> On 5/6/09 4:47 PM, Andreas Ericsson wrote:
>> Well, restarting or just reloading the configuration doesn't really make
>> a difference to what kind of monitoring is happening during the reload.
>> Even if Nagios were to reload the configuration without requiring a
>> restart, no network monitoring would happen during the reloading.
>>
>>> If we reload Nagios too often, it would simply pass the majority of its
>>> time exporting configuration/status to NDOutils and scheduling checks
>>> without doing any real work at all. Too seldom and new monitoring would
>>> take too much time before being scheduled.
>>>
>>> Any future plan regarding this aspect?
>>>
>> Well, I've experimented a little bit. It seems to be several orders of
>> magnitude faster to do the configuration parsing in two passes. One to
>> find out how many objects there are of each type and sort them into a
>> two-dimensional table of and then doing a binary search on that table,
>> as opposed to creating fixed-sized hash tables and pre-insert objects
>> into it. This is especially true for huge configurations, and appears
>> to be caused by far more beneficial memory access patterns and the
>> ability to only parse most objects a single time since we know that
>> all hosts have been parsed by the time services are parsed, fe.
> 
> 
> The main goal for us was to retrieve status information as fast as 
> possible in a centralized way. (because we have multiple Nagios servers)
> 

Use merlin for this. It can send events from several Nagios servers to
a single one and update a central database. I have no idea how you'd
sort out the configuration importing thing, but I guess once you sort
that, it'll work automagically.

> NDOutils was the solution we choose to answer our needs for the 
> following reasons:
> 
> 1) There's no known way (to me) to retrieve status information directly 
> from the daemon. It has to be exported to a file (status.dat)
> 2) Parsing status.dat takes too much time (I tried with Perl and PHP)
> 3) Writing a CGI script to export the status in XML using Nagios 
> functions isn't faster since it still relies on status.dat
> 4) Mounting a tmpfs folder and moving status.dat in it doesn't help
> 
> 
> Unfortunately, the main "problem" with NDOutils is that it reexports the 
> configuration and status at every reload. Clearing the "old" information 
> and exporting the *exact* same information is very time consuming and no 
> very efficient.
> 
> I found a patch which could improve/fix this behavior:
> http://opsview-blog.opsera.com/dotorg/2007/09/nagios-patch-da.html
> => Do not resend retained status to NDO
> 
> Only problem is that deleted hosts/services would never be removed from 
> MySQL if we apply the patch.
> 
> 
> To conclude, the real problem isn't with the Nagios restart process 
> itself but with:
> - NDOutils inefficiency at managing retention data
> - The fact we can't access status information in a fast and efficient way.
> 
> So I was hoping for some improvements regarding this aspect. (maybe by 
> using IPC/shared memory or a similar solution to access the status 
> information directly from the daemon memory)
> 

That would be a bad idea, imo. Shared memory would work, but it would
require aligning and indexing all the objects in memory, and then you'd
still have to do that part of a database engine that the database engine
actually does good (ie, filter on indexed tables).

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Register now for Nordic Meet on Nagios, June 3-4 in Stockholm
 http://nordicmeetonnagios.op5.org/

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com




More information about the Developers mailing list