Problems with many hanging Nagios processes (Nagios spawning rogue nagios processes eventually crashing Nagios server)

Ton Voon ton.voon at altinity.com
Thu Dec 21 12:54:37 CET 2006


Hi Mahesh,

On 19 Dec 2006, at 00:42, Mahesh Kunjal wrote:

> Here is what we did to resolve.
>
> 1. Edit the include/nagios.h.in
> change
> #define COMMAND_BUFFER_SLOTS 1024
> to
> #define COMMAND_BUFFER_SLOTS 60000
>
> And change
> #define SERVICE_BUFFER_SLOTS 1024
> to
> #define SERVICE_BUFFER_SLOTS 60000
>

I was intrigued by this as we have a performance issue, but not with  
the same symptoms. Our problem is that NSCA processes increase when  
the nagios server is under load. They appear to be blocking on  
writing to the command pipe. Switching NSCA to single daemon  
mitigates the problem (slaves will timeout their passive results),  
but we wanted to know where any slow downs could be.

 From your findings, we've created a performance static patch,  
attached. This collects the maximum and current values for the  
command and service buffer slots and is then written to status.dat  
(by default every 10 seconds). What I found with a fake slave sending  
128 results every 5 seconds was that the maximum values were fairly  
low (under 100), but when I put the server under load, the  
maximum_command_buffer_items shot up to 1969 and the  
maximum_service_buffer_items shot up to 2156 (had changed from  
defaults to your 60000).

This could show if the buffer is filled at various points or if there  
is not enough data ready for Nagios to process further down the chain.

I'd be interested in figures from other systems.

Warning: the patch is not thread safe, so there is no guarantees that  
the statistic data will not be corrupted (but should not affect usual  
Nagios operation). Applies onto Nagios 2.5. Tested on Debian with 2.6  
kernel.

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061221/ec141bd1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios_show_max_buffer_items.patch
Type: application/octet-stream
Size: 3811 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061221/ec141bd1/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061221/ec141bd1/attachment-0001.html>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list