Bug report: nagios shutdown removing lock file too early
Ton Voon
ton.voon at altinity.com
Tue Jun 13 19:06:22 CEST 2006
Ethan,
I think I've seen a problem with the nagios shutdown routine. If
nagios is doing a host check and a INT signal is sent, it seems to
take a long time before the nagios daemon dies. It looks like the
child nagios process is trying to complete all the retries for a host
check before going back into the main loop.
Also, it appears that the lockfile is being removed before the main
process dies. Below is the output for a 'while true; do ps -p 728; ls
-l /usr/local/nagios/var/nagios.lock; sleep 1; done' during a kill 728.
[snipped]
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
-rw-r--r-- 1 nagios nagios 4 Jun 13 17:20 /usr/local/nagios/var/
nagios.lock
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
-rw-r--r-- 1 nagios nagios 4 Jun 13 17:20 /usr/local/nagios/var/
nagios.lock
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory
PID TT STAT TIME COMMAND
728 ?? Ss 0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory
This shows the lockfile gets removed before the main daemon dies.
(This is from a kill 728, not using any init scripts.) Eventually the
daemon dies.
I've tested this on Nagios 2.2 on MacOSX 10.4, Nagios 2.0 on Debian
and Nagios 2.4 on Debian.
Sorry, not had time to delve into the source code.
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
More information about the Developers
mailing list