Child Process Spawning

Andrew Tjang andrew.tjang at ask.com
Thu Nov 23 01:13:46 CET 2006


Hello all!
 
I think I'm experienceing a problem with runaway child processes. I read
the FAQ and found the service_reaper_frequency entry... But I don't
think that's the problem (I've set it to 4 just to be sure)
 
A little about my setup:
 
I'm monitoring several hundred machines, and a couple thousand serices
on these machines.
I'm doing this using the passive method. (piping to the nagios.cmd file)
I dump the results of the other monitoring processes into nagios.cmd
every 5 minutes
 
The first couple of hours after startup, things appear to operating
fine, with no more than 5 or 6 nagios child instances running at any
given time, and no instance running for more than a couple of minutes. 
 
Sometime after the 2 hour mark, the child instances take long and longer
to complete, if they complete at all (upwards of 20 min and higher). And
these processes just start building up.
 
I can attach to these processes using strace and they all appear to be
doing things once i do that, but it's almost like the mere act of
observing these child processes makes them complete (whereas if i just
let them go, they would never finish), but other child processes are
truly "hung/slow-as-mol". 
 
I can kill these processes fine, but I can't forever monitor these
processes to make sure they don't get out of hand. 
 
Any light anyone can shed on this would be greatly appreciated:
 
1) why is nagios spawning these child processes (i'm passively
monitoring)
2) why are they not finishing (and this behavior is only visible after a
few hours of running)
3) how can i prevent this from occurring?
4) could it be that something is hanging the process and by the time it
becomes unhung, the child reads a new dump from the nagios.cmd file, and
thus never ends (i don't even know if that's what the child processes
are doing - the strace reads:
 
    write(6, "Hostname\0\0\0\0\0\0\0\0\0\0."..., 496) = 496
 
 
Thanks in advance for your help!
-Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20061122/e38c7652/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list