nohup and check_nrpe and timeout

David Shapiro David.Shapiro at sas.com
Fri Dec 5 14:37:17 CET 2008


Thank you Thomas, this looks like good info.  I do not seem to have an executable on Solaris called setsid though.  It is listed as a c function.  Nrpe.cfg does in fact let you increase the timeout, but I was thinking that will not help because my program remains running in a loop.  Re-iterations has it check logs that it is generating.  If it is not seen as running, I mentioned that it will just start it again.  However, since it is in a loop, I am thinking that nrpe will timeout no matter how much my timeout is set to.  The setsid idea looked interesting, but unfortunately I do not see it on my server.  The last one I think was using bash to close stdout, stdin, and stderr, but it also used setsid in your example (sigh).  The alarm handle idea did not work.  That leaves the ssh check age
 nt.  I will look into that today.

David

-----Original Message-----
From: Thomas Guyot-Sionnest [mailto:dermoth at aei.ca] 
Sent: Thursday, December 04, 2008 11:16 PM
To: David Shapiro
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] nohup and check_nrpe and timeout

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/12/08 01:27 PM, David Shapiro wrote:
> Hello,
> 
> I wrote something that starts a python script to check weblogic that if 
> it sees it is not running starts it.  The problem is that it takes 
> several minutes to start.  I tried in my script to just nohup and 
> background the process that also says it is starting and exists with a 
> 1.  For some reason though even though I do a nohup and background, it 
> does not run with a nohup.  It times out.  Is there a way to do this?  
> Why is check_nrpe maxed for 60 seconds?  Why will it not recognize I 
> just used a nohup and background?

nrpe is definitely not the best way to do it, but here's some insights
on what you could try:

1. I think the 60 second timeout is configurable in nrpe.conf

2. You can start nohup with setsid to run it in a new session. You can
possibly avoid the use of nohup at all by closing stdin/out/err (in
bash: exec </dev/null; exec >/dev/null; exec 2>/dev/null; setsid <prgm>)

3. It's possible that nrpe starts the alarm handler before exec'ing the
plugin; try resetting it before running. You could do that in perl:
perl -e 'alarm(0); exec <prgm>' (actually I think exec in perl will
invoke the shell which will get the the signal, so alarm(0) is useless
anyway)

4. check_by_ssh has a mode to start the remote program/script and
return. You will need to setup an ssh keypair for this to work (don't
forget that nagios will run it as the nagios user, do you'll have to set
up the keys for that user)


- --
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOKow6dZ+Kt5BchYRAnxsAKDRiYNHxGkoLGLww6YG5qnGNBbrIACfW6fY
WHe4BSKXgC5AH56aTWGWpW0=
=sdKI
-----END PGP SIGNATURE-----


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list