Ping processes piling up?

Andreas Ericsson ae at op5.se
Thu Jan 13 13:08:18 CET 2005


Schmitz, Carsten wrote:
> Hi,
> 
> This is not a core Nagios question but I figure some of you using
> Nagios might have seen it.
> 
> On my "Red Hat Linux release 7.3 (Valhalla)" box, for some weeks now,
> the pings started by Nagios keep piling up, meaning that I have ping
> processes running that are several days old. After a while the box
> gets slow, and I can't sleep coz I keep having those mysterious
> dreams of process table slot limitations.
> 
> Anyone seen this before?
> 
> Is my assumption correct that Nagios pings should die on the same day
> they were started (unless ping started 23:59:59 of course ;)
> 

Yes, assuming they actually time out, and assuming they can receive 
signals from the Nagios process or the program parsing the output of the 
ping program. ping, usually being setsuid, wouldn't normally accept 
signals, so it would be wise to make sure it receives the -n flag with a 
parameter, making it die properly when signals are sent.

I believe the nagios process might otherwise kill the check_ping plugin, 
causing the ping program to become an orphan. init (process nr 1) should 
normally take over those processes and reap (and ignore) their exit 
statuse every once in a while, but apparently they do not.

Do you have any kernel patches in place affecting the wait4 system call, 
or do you have a non-standard init program?

A solution might be to use the check_icmp program, which doesn't fork 
ping and thus can be killed properly by ist parent process (nagios).

> At the moment all I can do is "killall /bin/ping" every couple of
> days, whats the expected impact of this on Nagios? (I'd expect that
> if I kill a ping that Nagios currently "looks at" then the service
> soft alert gets triggered but since all my services have retry=3 I
> don't expect any problems - still don't like to put killall into
> cron, would rather have a clean solution).
> 
> Thanks,
> 
> ----------------------------------------------------------------------
>  Carsten Schmitz System Administrator Group Information Management 
> Aegon N.V. 
> ----------------------------------------------------------------------
>  "My password is my cat's name. Its called x6>B8e at 7w_4. I rename it
> every 30 days." 
> ----------------------------------------------------------------------
> 
> 
> 
> ------------------------------------------------------- The SF.Net
> email is sponsored by: Beat the post-holiday blues Get a FREE limited
> edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE --
> well, almost....http://www.thinkgeek.com/sfshirt 
> _______________________________________________ Nagios-users mailing
> list Nagios-users at lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please
> include Nagios version, plugin version (-v) and OS when reporting any
> issue. ::: Messages without supporting info will risk being sent to
> /dev/null
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list