Slow Nagios reloads with NDOUtils

Ton Voon ton.voon at altinity.com
Sat Nov 17 09:51:33 CET 2007


On 16 Nov 2007, at 21:23, mark.potter at academy.com wrote:


> My first problem, and I am not sure it is actually a problem, is  
> that when I do a reload of nagios (/etc/init.d/nagios reload) it  
> takes, what seems to me to be, a long time. It is usually around  
> 90-120 seconds for Nagios to start allowing use of the web  
> interface once the reload is initiated. A check of the files  
> reveals no errors (save one warning for a host with no services)  
> and the nagios process shows in a ps awux list. However the web  
> interface shows the "Whoops! Error: Could not read host and service  
> status information!" during the 90-120 second delay I mentioned  
> earlier.
>

Hi Mark,

There seem to be quite a few emails in this list about NDOUtils being  
a bit slow. We saw this about 6 months ago and have been optimising  
the hell out of it, but it boils down to this:

- NDO updates are synchronously applied to the database

This means that Nagios has to wait for the DB to finish the update  
before it continues. I believe Ethan is doing something at NDO after  
Nagios 3 is released.

We've done various tricks to try and reduce the time for a reload,  
which we will blog about on http://altinity.org soon, but I just  
haven't found the time to do it. The first couple of things that come  
to mind are:

- indexes should be re-arranged so that the time column is first.  
Currently, a lot of indexes have instance_id first. However, when you  
are doing a delete based on time, the index is effectively useless,  
so mysql has to do a complete table scan to work out which rows need  
to be deleted. This will cause mysql to take a lot of time. This is  
the single biggest thing that you can do
- reduce the amount of times ndo2db calls the housekeeping routine.  
By default, it is every 60 seconds. We've reduced down to 600  
seconds. It could probably be even less frequent. One thing I've just  
thought is to have ndo2db NOT do any housekeeping and do it yourself  
(mysql is multi-user after all)
- reduce the amount of data sent. We stop the broker module sending  
systemcommands, log entries and passive commands
- we've also patched Nagios to not send status data on a reload. By  
default, Nagios will send data to ndo about the status of all hosts/ 
services on a reload. This is not required because the db already  
knows what the status of the things were before the reload!
- we're currently testing a de-coupling of NDOMOD from ndo2db. The  
idea is that NDOMOD writes files and then a separate daemon loads  
those files into ndo2db. This effectively means that NDO updates are  
now asynchronous, though there is now a delay in the updates

We've also made a patch to Nagios 2.9 (which Ethan has applied to  
Nagios 3), where the status file is kept between reloads, so you  
don't get the dreaded "Could not read host and service status  
information" error. That is available at http://altinity.blogs.com/ 
dotorg/2007/09/nagios-patch-da.html.

We love NDOutils - a lot of our features in Opsview depends on it,  
including our favourite, Hostgroup Hierarchy (http://opsview.org/ 
hostgrouphierarchy). So we're interested in making NDOutils work as  
fast as possible too.

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list