Problems with many hanging Nagios processes (Nagios spawning rogue nagios processes eventually crashing Nagios server)

Mahesh Kunjal mkunjal at gmail.com
Tue Dec 19 16:36:07 CET 2006


On 12/19/06, Andreas Ericsson <ae at op5.se> wrote:
> >
> > The problem here was command buffer had a limited size of 1024. This is the default setting in include/nagios.h.in and is in the line #define COMMAND_BUFFER_SLOTS 1024.
>
> This is the number of buffers that will be available for writing into,
> not the number of total bytes available. Each command buffer slot holds
> MAX_INPUT_BUFFER bytes.

Yes each message is of MAX_INPUT_BUFFER which defaults to 1024.
What i was getting at was the number of messages(results) you can write to.

>
> >
> > This was not enough and the child process started to wait for memory to be freed so that the pipe data retrieved can be put in buffer.
> >
> > While this child process waited for memory to be freed, the command worker thread got woken up and realized that there is data in pipe and forked another child. This got repeated and eventually server went out of memory.
> >
>
> A very concise and correct description of what's going on. Thanks.
:)


>
> > Here is what we did to resolve.
> >
> > 1. Edit the include/nagios.h.in
> > change
> > #define COMMAND_BUFFER_SLOTS 1024
> > to
> > #define COMMAND_BUFFER_SLOTS 60000
> >
> > And change
> > #define SERVICE_BUFFER_SLOTS 1024
> > to
> > #define SERVICE_BUFFER_SLOTS 60000
> >
>
> This would indeed solve the problem, although you could have gotten away
> with the same amount of SERVICE_BUFFER_SLOTS as there are services
> configured on the system, and the same amount of COMMAND_BUFFER_SLOTS as
> there are hosts and services. Provided the slaves also send passive
> hostchecks, ofc, otherwise you can set it to the amount of services instead.
The customer was planning on adding more services.
Right now at peak we got 14K results in a second and reaper frequency
of 2 second could fill 28k slots on command buffer .
Came up with 60000 just in case if nagios is not digesting the buffer
fast enough..

>
> It should also be noted that these settings shouldn't be modified unless
> needed, as it will make Nagios use quite a bit more memory per default
What i remember looking at code is, the child process allocates memory
per message read and if the number of messages in buffer is less than
COMMAND_BUFFER_SLOTS.
My understading is Nagios wont allocate all of COMMAND_BUFFER_SLOTS slots.
It will be allocated only if results are coming at short interval
and/or if it is not being processed fast enough.


----
Mahesh Kunjal   mkunjal at gmail.com

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list