Nag event handlers restarting failed programs on NT ?

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Jan 16 21:32:16 CET 2003


Dear Sir,

I am writing to thank you for reply (I will certainly take your
advice) and summarise for the archives some options,

On Thu, Jan 16, 2003 at 09:33:42AM -0600, Carroll, Jim P [Contractor] wrote:

> > Is anyone using Nagios (event handlers) to restart failed 
> > programs on NT hosts ?


> An interesting thought.  I don't have an answer off the top of my head
> (still working on my first coffee).
> 
> You might wish to check out www.infrastructures.org, subscribe to the
> mailing list (it's quite low-volume) and post a variant of your query there.
> 

Probably in order of seriousness/helpfulness (although I think option 2
is the probably the most durable). 

1 Convert the program - ask someone else to do it - to a service and use
the rpcclient program (from Samba-tng or Samba-2.2.x or Samba-alpha) to
start the service.

This requires that

. the Nag host be set up with a machine account on the MS host
that is running the program/service

. the program can be converted to a service (I understand from a Windows
programmer that in the case of Java applications this can be kludgy).

2 Suggested by Mr T De Blende,

'
* NSClient checks to see if the program is still running.

In our case, the culprit program will be appending heartbeat messages to
a text file in a shared directory. A Nag service check will 'tail' that
file and return a CRITICAL if it can find no log records newer (the
records will have time stamps) than a the current time minus a threshold
iterval (the last record in the file was logged more than 10 minutes
ago).

* If the program is not running, it puts a simple text file on a
Windows share on the server that is supposed to be running that
program. Just share a directory with write rights only for a certain
account that is used by the Nagios box to make the SMB connection.

* Create a small script on the Windows server that checks for the
existance of that text file in that shared directory, and if it is
there: 1) delete it and 2) restart the program.'

(This latter program may be run by AT periodically).

3 Wait for it ... this is my idea.

Write a Tk/Expect or Perl/Tk program to drive a VNC session with the
host and use this VNC session to start the program.

It would probably be a good thing to have the program set up to be run
from the GUI (by clicking an icon that runs a bat file for example).

Sheesh.

Yours sincerely.

-- 
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------

'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'

from Meditation 17, J Donne.


-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache 
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en




More information about the Users mailing list