BUG/PATCH: Runaway processes under Linux (and others)

David Mansfield nagios at dm.cobite.com
Thu Apr 27 20:51:08 CEST 2006


[ apologies if dups are recieved, original sent from the wrong mail account]


bruce wrote:
> On Thu, 27 Apr 2006, Andreas Ericsson wrote:
> 
>> bruce wrote:
> 
>>> On some systems, a rarer problem shows itself, making the solution to 
>>> the Nagios issue somewhat harder.  This problem is when a child 
>>> process, inheriting the parent's signal handlers, receives a signal 
>>> (usually SIGCHLD, sometimes SIGTERM) and then exits, taking out the 
>>> parent's lock/pid file.  Thus, one no longer knows which process is 
>>> the legitimate parent process.
>>
>> If nagios' grandchildren (the ones that popen() commands) receives 
>> SIGCHLD from anything but the check it's running something is very, 
>> very wrong with the system you're using. Are you perhaps using the old 
>> and deprecated NGPT-library?
> 
> 
> The lock removal instead seems to be occuring with the child process 
> created in my_system(), which sometimes stalls at a point before the 
> signal handlers get reset (or they don't get reset, my debugging 
> statements weren't fine-grained enough).  When the parent sends a TERM 
> signal to the child when it is in this state (due to timeout), the child 
> runs the signal handlers inherited from the parent, removing the lock file.
> 

BTW I haven't read your patch or the code in question, so I'm just a**
talking.  But from the sounds of it:

Wouldn't it be prudent to use sigprocmask to mask any (and all) signal
which is being 'used' for some purpose in the parent before forking.
After forking, in the child, reset the signal handler to something sane
and then unmask.  In the parent, simply unmask after fork and any
'missed' signals will be delivered.

That way, there's no race condition...

David






-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Developers mailing list