Nagios 2.0b3 hangs on FreeBSD

Scot Jenkins scotjenkins at gmail.com
Tue May 17 07:15:36 CEST 2005


Hi,
I've searched the archives but haven't found anything relevant to my
issue so I'll post my question.

Running Nagios 2.0b3 (April 03, 2005) -- the latest in the ports tree
on two FreeBSD 4.10 systems.  After about 2-3 days, Nagios just stops
working.  There are several nagios processes still running on the box,
one of which is a child of the main Nagios process, and that child is
the top(1) process constantly switching in and out of RUN state.  The
other processes are all sleeping.  I've run FreeBSD's ktrace/kdump on
the running child (pid 46147 here) and it just keeps doing this:

 46147 nagios   CALL  nanosleep(0xbfbfac58,0)
 46147 nagios   RET   nanosleep 0
 46147 nagios   CALL  sched_yield
 46147 nagios   RET   sched_yield 0
 46147 nagios   CALL  sched_yield
 46147 nagios   RET   sched_yield 0

Running ktrace/kdump on the main nagios process (46147's parent)
repeatedly shows:

 84617 nagios   RET   read 0
 84617 nagios   CALL  select(0,0,0,0,0xbf3ff7a0)
 84617 nagios   RET   select 0
 84617 nagios   CALL  read(0x5,0x80bc000,0x1000)
 84617 nagios   GIO   fd 5 read 0 bytes

We have two systems that experience this problem.  Both are Intel PII
class system with 256 MB of RAM.  One is an old HP desktop (single
processor) monitoring 26 hosts and 48 services.  The other system is a
Dell 1550 (dual processor), monitoring 21 hosts and 274 services.  The
load average on both systems is generally in the 0.00 - 1.00 range
(ok, I run setiathome on the  HP desktop but otherwise it's idle).

Nagios seems to hang at random times, never the same time twice, but
it generally doesn't run more that 2-3 days.  I've looked but don't
find any cron jobs running at the times when nagios hangs (eg, no high
load at the time nagios hangs).

On both systems, Nagios must be forcibly killed (killall -9 nagios)
and manually restarted.  We've tried both the shutdown and restart
links from the "process info" CGI web page and restarting from the
command line via "/usr/local/etc/rc.d/nagios.sh restart".  Neither
work.  The CGI gives no errors and appears to work but does nothing. 
The command line just repeatedly says it's stopping the main nagios
parent process but it never actually kills it.

Any one else having this problem or have any suggestions on how to
further debug it?

Thanks,
Scot Jenkins


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_idt12&alloc_id344&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list