Out of memory failures on Nagios master server

Ton Voon ton.voon at altinity.com
Fri Mar 9 18:20:36 CET 2007


Jonathan,

On 9 Mar 2007, at 16:39, Wheeler, JF (Jonathan) wrote:

> My configuration has a master server and 2 slave servers with about  
> 730
> hosts and 16000 service checks.  All our systems are running  
> Linux.  For
> some time now the master server has been running out of memory between
> 4:50 and 5:00 such that the server either kernel panics (rarely) or it
> kills all useful processes.  To try and investigate the problem I have
> been running at commands to run "vmstat 15 160" and "date; ps -ef;  
> sleep
> 15" (160 times) to record system activity at 15 second intervals  
> for 40
> minutes, i.e. from 4:30 until 5:10.  This has revealed that the  
> problem
> is caused by a) nsca processes starting and not being completed  
> (today's
> maximum count was 4447) until they all suddenly complete at about  
> 4:50.

You may want to try running nsca in --single mode. You can  
potentially lose some slave checks, but you would never get the high  
number of nsca processes.

We've seen situations where a high number of nsca processes makes the  
nagios server crawl, slowing other things down.

> During this time vmstat shows that memory usage increases slowly,  
> but it
> is all released when the nsca processes run.  About 10 minutes later
> there are many separate nagios processes which do not complete  
> (183); as
> the nagios process is quite large this fills system memory and swap
> space which effectively kills the system.

IIRC, passive check results are processed by the main nagios process,  
so shouldn't be responsible for multiple nagios processes. There are  
multiple nagios processes for parallelised service checks - perhaps  
you could restrict the number of parallelised checks?

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list