Full Throttle Nagios

Marcel mitsuto at gmail.com
Tue May 18 23:41:21 CEST 2010


Again, thank you for all the quick answers. This list/community is
awesome!!!

I'm already using tmpfs, increased named pipe buffer size, did everything
that one is supposed to do in order to increase performance.

I think I'd go with removing sleep calls in the code. I'm at version 3.2.1
and would love to have a look at Max's patch!

Notification is not my bottleneck, and this is not for my own nagios
install, it's for someone else, so I cannot post nagios.cfg here. Sorry.

But again, thanks for all the answers!!!

On Tue, May 18, 2010 at 5:49 PM, Mike Lindsey <mike-nagios at 5dninja.net>wrote:

> Marcel wrote:
> > When I have more than, say, 10k checks, I start seen check latency rises
> > and there just isn't anything that could be done, even distributed
> > monitoring have the nagios.cmd write-lock bottleneck.
>
> So, I've just gone through this, and the single greatest bottleneck I
> had to deal with is notifications.  But, I have a lot of people in the
> notification tree, and pull in a lot of meta-data to make ticket
> tracking and issue resolution easier and faster.  Since Nagios needs to
> know the exit status of notification commands, it doesn't fork before
> notifications.. it just plods along waiting for the notification command
> to exit.
>
> I switched all our non-pager notification commands to drop a spool file
> in a directory, letting another process read the spool files, generate
> email contents, query ticket databases, pull in documentation or
> extended testing information (full mysql processlist output, for dbas..
> etc) and caching it for subsequent notifications for that event.
>
> That showed a HUGE improvement to my master server's performance.
>
> If notifications aren't your bottleneck, you can move all your temporary
> files to ramdisk.
>
> You can also increase your FIFO pipe size, but that only delays the
> issue and doesn't really solve the problem if you're always running hot.
>  It also probably involves recompiling your kernel.
>
> If you're using nsca, you can cache your updates for a second or two, so
> that multiple updates happen in the same socket connection.
>
> Alternately (or additionally) you can have nsca update the checkresults
> directory, directly, skipping the steps where nagios reads the command
> pipe, and then just writes it back out to the checkresults directory.
>
> I can package up a patch (against 2.7.2) of those last couple changes (I
> need to submit them, anyway).  If you're manlier than I might be, you
> could also consider modifying the core nagios to allow submissions from
> distributed nagios servers, directly to a socket, but doing that right
> might require serious threaded c foo, and depending on your OS and
> threading library, you might be locked to a single core.
>
> So, you have options.  They're not all equal, and aren't all easy.  But
> you wouldn't be working with monitoring if you didn't like challenges...
>  :)
>
> --
> Mike Lindsey
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100518/6c92590a/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------

-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list