Nagios3 hang

Jason Brittain jason.brittain at gmail.com
Wed May 20 19:24:26 CEST 2009


I am also having this problem.  My event handler script runs (and
succeeds, thankfully), however Nagios hangs at the point of running
the script, and at that point Nagios ceases to monitor anything, and
no notifications get sent when services fail.

Previously, I had Nagios 3.0.2, and it was hanging and using 100% CPU
when the event handler ran.  Here's the strace output from that:

sendto(3, "<14>May 19 14:51:01 nagios: SERV"..., 132, MSG_NOSIGNAL,
NULL, 0) = 132
open("/usr/local/nagios/var/nagios.log", O_RDWR|O_CREAT|O_APPEND, 0666) = 6
fstat(6, {st_mode=S_IFREG|0664, st_size=12893, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x2aaaaab19000
write(6, "[1242769861] SERVICE EVENT HANDL"..., 117) = 117
close(6)                                = 0
munmap(0x2aaaaab19000, 4096)            = 0
pipe([6, 7])                            = 0
fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2b18cafb9ca0) = 7893
close(7)                                = 0
wait4(7893, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 7893
--- SIGCHLD (Child exited) @ 0 (0) ---
read(6, "Restarting myservice...", 1023) = 27
read(6, 0x7fffdfb28030, 1023)           = -1 EAGAIN (Resource
temporarily unavailable)
read(6, 0x7fffdfb28030, 1023)           = -1 EAGAIN (Resource
temporarily unavailable)

.. followed by an infinite number of read(6, ...) lines, and the CPU
goes to 100% starting right there.
In this case, it's a perl event handler.  I'm pretty sure that the
event handler script works fine.  When I do "service nagios restart"
when Nagios is hanging, it appears to restart (from the output of the
init script), however it did not kill the hung Nagios process, so the
restart does not properly succeed (unreliable restart code in the init
script).  I must do kill -9 on the hung nagios process and then a
service restart to get it working again.

Then, I upgraded to Nagios 3.1.0 and tried it again.  I get the same
thing, only the CPU utilization stays low.. but Nagios still hangs.

Thanks for looking into this!
-- 
Jason Brittain

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects




More information about the Developers mailing list