FW: Problems with distributed setup, master overload?

Wheeler, JF (Jonathan) J.F.Wheeler at rl.ac.uk
Wed Jun 13 12:30:07 CEST 2007


-----Original Message-----
From: nagios-users On Behalf Of Jeffrey Lensen
Sent: 10 June 2007 08:28

> I recently extend our distributed Nagios setup of 1 master and 2
distributed slaves (in 
> which the master also had a lot of checks running), to 1 master and 5
distributed slaves
> (in which the master does no checking at all, except for host checks).
>
> This setup had 556 hosts and roughly 7000 service checks. Ever since I
modified this
> setup, the Nagios master host has been giving me problems. 
>
> The symptoms:
> - When starting both Nagios and NSCA, I see NSCA accepting checks in
my logfiles, but none
> get processed by Nagios.
> - After a few minutes NSCA processes are starting to build up,
increasing with 5-10
> processes per second. In a few minutes it reaches a few thousand
processes and the machine
> starts hanging.
> - Sometimes the number of Nagios processes start increasing, instead
of the NSCA
> processes. Same result, the machine starts hanging.

I have seen similar problems, though in my case (1 master, 2 slaves, 824
hosts, 16000+ services) the queued NSCA processes are eventually
flushed.  However the Nagios master server also suffers from memory
leaks; it eventually (after a period of 1 - 5 days) crashes with a
kernel panic because there is no free memory or reaches a state where
the kernel has killed all useful processes (e.g. nagios, nsca, sshd,
ntpd, etc) in attempt to cure OOM (Out Of Memory) problems.
Interestingly trying to strace the first daughter nsca process seems to
bring everything into life and the queue of NSCA processes quickly
flushes.

I have tried running nagios using option -s to get configuration
recommendations and nagiostats to get usage information on both master
and slave servers, but they do not reveal anything useful.  My current
plan is to introduce 3 more slave servers as I have heard that this
helps.

Any comments would be helpful to me as well.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list