Bug report: nagios shutdown removing lock file too early

Ethan Galstad nagios at nagios.org
Thu Jul 6 23:11:53 CEST 2006


sean finney wrote:
> On Tue, Jun 20, 2006 at 03:56:44PM +0100, Ton Voon wrote:
>> I think the lockfile removal is the source of the "multiple Nagios  
>> processes running". The example daemon-init script uses the lockfile  
>> as the status of the process. If you were to do a restart, Nagios  
>> would complete the stop because the signal was sent, but Nagios would  
>> actually be in the process of shutting down. Meanwhile a start would  
>> run, so another Nagios process is kicked off. Then, as both Nagios  
>> processes are trying to access the same files, mayhem can ensue :)
>>
>> We've got our own startup script and we've change the stop routine to  
>> wait until nagios has actually stopped before moving out of the stop  
>> function. Much more stable, but there's a long delay if Nagios is in  
>> the middle of a host check.
> 
> note that we've been seeing hard-to-reproduce problems along these
> lines in the debian packages.  debian bugs:
> 
> #338391: nagios-common: nagios often leaves multiple processes around 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=338391
> 
> #376070: nagios-common: init script sporadically fails to see existing nagios process 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=376070
> 
> 
> comments would be welcome.
> 
> 	sean

Once the latest 2.x CVS snapshot gets tested to fix the extended service 
info definition segfault, I'll release 2.5 (which also includes the lock 
file removal fix).


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Developers mailing list