Problems with extensive passive monitoring

Andreas Ericsson ae at op5.se
Mon Oct 9 15:24:59 CEST 2006

Previous message: Problems with extensive passive monitoring
Next message: format string crashes in send_nsca
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Mike Becher wrote:
> 
> The whole description can be read on:
>   http://www.mountcup.de/tiki/tiki-index.php?page=mibe-nagios-passive-monitoring
> 

[ and has thus been cut from this mail ]

> My solution
> -----------
> Instead of calling an external program (ocsp_command or ochp_command) for 
> each external command message to forward it from CMNS to SMNS let write 
> the nagios process these messages in a named pipe. The patch attached 
> gives you this functionallity for nagios version 2.5.
> 
> Then let a helper program read from this named pipe on CMNS site and let 
> it forward the messages through a (I call it here) channel to whatever you 
> want, in this case to SMNS. I have written a perl program that does this 
> for you which is added as attachment too.
> 
> What do you thing about the option to use namend pipes in addition to 
> ocsp_command and/or ochp_command running as external process?

The problem with using pipes is that they normallly have a very limited 
chunk of memory to use (usually only 4KB), which means that when the 
combined data from all the slaves exceed this limit inside one cycle of 
Nagios reaping them, you get a buildup of processes that are waiting in 
spinlock for the pipe to empty so they can write to it. When the 
spinlock ends, the pipe instantly fills up again because at any one time 
there will always be more data waiting to be written than there is 
waiting to be read.

I'm not sure your solution fixes this problem for the master nagios 
server, although it will indeed provide a performance boost as it 
doesn't fork() as much as the old solution. My guess is that if this 
makes the problems go away, the changes in system load just allows 
Nagios to keep up with the data-flow. So while being a definite 
improvement, you're likely to be hit by the problem again if your 
network grows, or if you get some network problem that causes Nagios to 
suddenly run checks much more frequently than normally.

> The NDO interface can't be used in this case because there aren't any 
> connectors inside the code for external commands.
> 

Yes there is. Or rather, you don't need them as ndo-modules have direct 
access to Nagios' internal API's. A much better solution would have been 
to send check-result data from a module to a socket-listening module on 
the master end which then uses the internal API's to update server/host 
status. This would allow the bottleneck (currently the 4KiB FIFO) to be 
spread over a more or less indefinite number of channels which all can 
be much, much larger than 4KiB.

This is unfortunately also much more complex, as it requires mucking 
about with Nagios' internals and you'd have to deal with the somewhat 
tricky issue of multiplexing inside a multi-threaded application. The 
fact that the module would need to operate in at least three different 
modes (sending/relaying/receiving) doesn't make things easier.

Good thing winter's soon upon us, so one can get busy with interesting 
things again. ;-)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

Previous message: Problems with extensive passive monitoring
Next message: format string crashes in send_nsca
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Developers mailing list