Bug report: nagios shutdown removing lock file too early

Ton Voon ton.voon at altinity.com
Tue Jun 13 19:06:22 CEST 2006


Ethan,

I think I've seen a problem with the nagios shutdown routine. If  
nagios is doing a host check and a INT signal is sent, it seems to  
take a long time before the nagios daemon dies. It looks like the  
child nagios process is trying to complete all the retries for a host  
check before going back into the main loop.

Also, it appears that the lockfile is being removed before the main  
process dies. Below is the output for a 'while true; do ps -p 728; ls  
-l /usr/local/nagios/var/nagios.lock; sleep 1; done' during a kill 728.

[snipped]
   PID  TT  STAT      TIME COMMAND
   728  ??  Ss     0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/ 
nagios/etc/nagios.cfg
-rw-r--r--   1 nagios  nagios  4 Jun 13 17:20 /usr/local/nagios/var/ 
nagios.lock
   PID  TT  STAT      TIME COMMAND
   728  ??  Ss     0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/ 
nagios/etc/nagios.cfg
-rw-r--r--   1 nagios  nagios  4 Jun 13 17:20 /usr/local/nagios/var/ 
nagios.lock
   PID  TT  STAT      TIME COMMAND
   728  ??  Ss     0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/ 
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory
   PID  TT  STAT      TIME COMMAND
   728  ??  Ss     0:01.95 /usr/local/nagios/bin/nagios -d /usr/local/ 
nagios/etc/nagios.cfg
ls: /usr/local/nagios/var/nagios.lock: No such file or directory

This shows the lockfile gets removed before the main daemon dies.  
(This is from a kill 728, not using any init scripts.) Eventually the  
daemon dies.

I've tested this on Nagios 2.2 on MacOSX 10.4, Nagios 2.0 on Debian  
and Nagios 2.4 on Debian.

Sorry, not had time to delve into the source code.

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon




More information about the Developers mailing list