ocsp/ochp zombie with restart?

Andreas Ericsson ae at op5.se
Thu Mar 3 00:54:19 CET 2005


Percy Jahn wrote:
> Hello ,
> 
>   if i restart nagios via init.d script it happens, that sometimes nagios
>   is not being killed. I am using an CVS Snapshot from mid of dec 2004 on
>   suse v9.0.  (Killing is done by "kill <nagiospid>" as far i can see.)
>   It is also possible, that this behavior happened at reload. We use an
>   automated script, to copy configurations to machines and restart/reload
>   nagios and i didnt figured out, when exactly this happens. It is hard to
>   debug, because it happens rarely.
> 

Actually it doesn't happen at all but nagios sits and waits for the 
worker threads to exit gracefully. It happens when nagios is reaping 
check results. It's important that you don't try to start nagios again 
before the worker threads are fired up again, or things will go haywire.

>   If i take a look at running processes, everytime i detect one zombie
>   process called as ocsp/ochp via nagios.

Is the state actually Z? If so, you might have a warped pthreads 
implementation, or init might not be doing its job reaping orphaned 
processes. It could also be that one of the worker threads is caught in 
uninterruptable IO while reading from the command-pipe while the master 
thread deletes or closes it. I don't think it's a terribly good idea for 
Nagios to catch SIGPIPE, considering it handles one, but I'm not the 
boss of that.

> I suppose, nagios was being
>   killed, while making an ocsp/ochp check and the parent process of the
>   check exits without killing all childs. (The parent of the check-process
>   is the child of the process being killed by the kill command)
> 

This is weird. It indicates that the master pid isn't actually written 
to the nagios pid file (which is weirdly named lock file everywhere in 
documentation, configuration and code - weirdly because lockfile is a 
redhat invention placed in /var/lock/subsys/progname and always empty).

>   Ive not detected a bugfix, solving this problem on cvs. So i skipped checking
>   the newest version.
>   
>   Is this a known issue? Or some other suggestions? Maybe for better
>   debugging?
> 

gdb is your friend.

> --
> Best regards
> Percy Jahn
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click




More information about the Developers mailing list