Problems with nrpe2 signals and plugin cleanup

Thomas Guyot-Sionnest dermoth at aei.ca
Tue Feb 26 04:20:13 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25/02/08 04:17 PM, Bill Moran wrote:
> I'm writing a custom plugin for our application that runs under nrpe2.
> 
> This bugger deals with a lot of data (potentially several G) thus nrpe2
> is configured with a large timeout (300s) and it's impractical to keep
> all the data in RAM, so I'm using temp files.
> 
> My problem is that sometimes network problems cause the script to take
> longer than 300 seconds to run.  In this case, I want to receive an
> alert, so all is well.  The problem here is that nrpe2 terminates the
> script so the temp files are left lying around.
> 
> In looking for a more elegant solution than having admins clean up
> temp files manually, or having a cron job clean them up, I tried
> installing a signal handler in the plugin to guarantee cleanup
> of the temp files, but it didn't work, so I delved into nrpe2s
> source a bit to figure out why.  I found that on timeout, nrpe2
> issues a SIGTERM immediately followed by a SIGKILL.  Since SIGKILL
> is not catchable, my theory is that the SIGKILL signal arrives
> before my script has had a chance to run the signal handler for
> the SIGTERM, thus the cleanup is never done.
> 
> So ... I've two questions:
> 
> First, does anyone have a suggestion on how to handle this better
> in the script?

You should set an alarm and handle it yourself. You could for example
have your script timeout by itself after 300 seconds, and NRPE
terminating the script after 350 seconds (or more if it may take longer
to cleanup). See what Perl plugins do for example...

> Second, I'm curious about the rapid issuance of the TERM/KILL
> signals.  Is there anything preventing nrpe2 from simply sleep()ing
> a few seconds between the two signals?  I mean, if I'm willing to
> wait 300s for success, I'm willing to wait 305s for a clean failure.

While I agree it doesn't make much sense to TERM and KILL right after,
the only thing I'd do is remove the TERM. Nagios plugins by design must
not run indefinitely, so NRPE isn't different. If you sleep between
both, then how long should it be? This raise many issues, so it's better
to stick with plugins doing their own timeouts.


Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHw4Vt6dZ+Kt5BchYRAq5eAJ4jrUg68i5qKbAsRjkydfEBy5/nNACdHF/2
ajPGOb0OQ2H2pFJQEIaBQ+0=
=oeOp
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/




More information about the Developers mailing list