checking process time

Marc Powell marc at ena.com
Thu Sep 8 20:25:35 CEST 2005



> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Rossz Vamos-Wentworth
> Sent: Thursday, September 08, 2005 11:26 AM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] checking process time
> 
> I have a perl script used as a pipe for email that does some special
> processing of data.  Occassionally, unfortunately, it gets "stuck" and
> does not terminate.  When this happens, it ends up using most of the
CPU
> and pretty much screws up the system.  Until I can track down what is
> causing the infinite loop I was wondering if there was a way to check
> the life of a process of a specific name and execute an event handler
if
> it's been running too long.  The script should only take a few seconds
> to run, so I figure if it is more than a few minutes old I can simply
> have nagios kill the problem process (e.g. (kill -9 pid" should do the
> job).

Nagios-plugins-1.4.1 check_procs *under linux* adds an additional metric
called ELAPSED which appears to allow for checking how long a process
has been running. I've tried testing it but the call to ps isn't
including the 'etime' option ala "/bin/ps -axwo 'stat uid ppid vsz rss
pcpu comm args etime'" so it isn't working properly. It looks to me like
configure tests less informative variations of the ps command first and
if one of those matches it will use that for the ps format instead of
progressing to more informative variations, including the one that has
etime. From configure.log --

configure:14078: result: /bin/ps
configure:14086: checking for ps syntax
configure:14095: result: /bin/ps axwo 'stat uid pid ppid vsz rss pcpu
comm args'

when in fact, the one that includes etime works correctly (taken from
configure) --

$ ps -weo 'stat comm vsz rss user uid pid ppid etime args'
STAT COMMAND            VSZ  RSS USER       UID   PID  PPID     ELAPSED
COMMAND
S    init              1376  368 root         0     1     0 132-06:03:18
init
SW   keventd              0    0 root         0     2     1 132-06:03:17
[keventd]
SWN  ksoftirqd_CPU0       0    0 root         0     3     1 132-06:03:17
[ksoftirqd_CPU0]

Can anyone else confirm this as a bug? I don't see anything in the
tracker.

--
Marc 


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list