Out of memory failures on Nagios master server

Wheeler, JF (Jonathan) J.F.Wheeler at rl.ac.uk
Fri Mar 9 17:39:00 CET 2007


My configuration has a master server and 2 slave servers with about 730
hosts and 16000 service checks.  All our systems are running Linux.  For
some time now the master server has been running out of memory between
4:50 and 5:00 such that the server either kernel panics (rarely) or it
kills all useful processes.  To try and investigate the problem I have
been running at commands to run "vmstat 15 160" and "date; ps -ef; sleep
15" (160 times) to record system activity at 15 second intervals for 40
minutes, i.e. from 4:30 until 5:10.  This has revealed that the problem
is caused by a) nsca processes starting and not being completed (today's
maximum count was 4447) until they all suddenly complete at about 4:50.
During this time vmstat shows that memory usage increases slowly, but it
is all released when the nsca processes run.  About 10 minutes later
there are many separate nagios processes which do not complete (183); as
the nagios process is quite large this fills system memory and swap
space which effectively kills the system.  You might think, given the
time that this is happening, that this is affected by cron, but for this
morning I had retimed cron.daily to run at 10:02 rather than 4:02.  Has
anyone seen anything like this ?  I can say from the master server logs
that no tests seem to be recorded from about 4:00 onwards; if they
system survives they start after that.  Any help would be appreciated.
The server is a blade server with a single CPU but it is running with
hyper-threading on (if that makes a difference); the kernel is
2.6.0-42.0.8

Any suggestions would be appreciated.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list