Full Throttle Nagios

Mike Lindsey mike-nagios at 5dninja.net
Tue May 18 22:49:25 CEST 2010


Marcel wrote:
> When I have more than, say, 10k checks, I start seen check latency rises 
> and there just isn't anything that could be done, even distributed 
> monitoring have the nagios.cmd write-lock bottleneck.

So, I've just gone through this, and the single greatest bottleneck I 
had to deal with is notifications.  But, I have a lot of people in the 
notification tree, and pull in a lot of meta-data to make ticket 
tracking and issue resolution easier and faster.  Since Nagios needs to 
know the exit status of notification commands, it doesn't fork before 
notifications.. it just plods along waiting for the notification command 
to exit.

I switched all our non-pager notification commands to drop a spool file 
in a directory, letting another process read the spool files, generate 
email contents, query ticket databases, pull in documentation or 
extended testing information (full mysql processlist output, for dbas.. 
etc) and caching it for subsequent notifications for that event.

That showed a HUGE improvement to my master server's performance.

If notifications aren't your bottleneck, you can move all your temporary 
files to ramdisk.

You can also increase your FIFO pipe size, but that only delays the 
issue and doesn't really solve the problem if you're always running hot. 
  It also probably involves recompiling your kernel.

If you're using nsca, you can cache your updates for a second or two, so 
that multiple updates happen in the same socket connection.

Alternately (or additionally) you can have nsca update the checkresults 
directory, directly, skipping the steps where nagios reads the command 
pipe, and then just writes it back out to the checkresults directory.

I can package up a patch (against 2.7.2) of those last couple changes (I 
need to submit them, anyway).  If you're manlier than I might be, you 
could also consider modifying the core nagios to allow submissions from 
distributed nagios servers, directly to a socket, but doing that right 
might require serious threaded c foo, and depending on your OS and 
threading library, you might be locked to a single core.

So, you have options.  They're not all equal, and aren't all easy.  But 
you wouldn't be working with monitoring if you didn't like challenges...  :)

-- 
Mike Lindsey

------------------------------------------------------------------------------

_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list