passive check expire race condition

Ethan Galstad nagios at nagios.org
Sat Oct 20 23:34:27 CEST 2007


Michelle Craft wrote:
> [1185891648] SERVICE ALERT: emperor20.cs.wisc.edu;what;OK;HARD;1;OK: Script ran.
> [1185895333] Warning: The results of service 'what' on host 'emperor20.cs.wisc.edu' are stale by 10 seconds (threshold=3700 seconds).  I'm forcing an immediate check of the service.
> [1185895335] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;emperor20.cs.wisc.edu;what;0;OK: Script ran.
> [1185895343] SERVICE ALERT: emperor20.cs.wisc.edu;what;CRITICAL;HARD;1;CRITICAL: Test failed.  Passive check didn't send info.
> 
> It looks like, once the stale condition is noticed, it about takes 10 
> seconds to run the alternate active/fail check.  If a passive check comes 
> through in that time setting the state to OK, the fail check overrides it.
> 
> Is there a way to make the forced check verify that a check hasn't come 
> through in the meantime?  Or to put a semaphore on the check so that the 
> new passive check isn't processed until the forced check completes?
> 
> --
> Michelle
> 

This has been on my todo list for a while, and its finally done. :-)  A 
fix was just posted to the HEAD branch of CVS (Nagios 3) that will cause 
freshness check results to be ignored if a passive check arrived between 
1) the time the service was detected as stale and a check was initiated 
and 2) the time the freshness check results are processed.


Ethan Galstad
Nagios Developer
___
Email: nagios at nagios.org
Web:   www.nagios.org

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list