passive check expire race condition

Michelle Craft craft at cs.wisc.edu
Tue Jul 31 17:53:34 CEST 2007


[1185891648] SERVICE ALERT: emperor20.cs.wisc.edu;what;OK;HARD;1;OK: Script ran.
[1185895333] Warning: The results of service 'what' on host 'emperor20.cs.wisc.edu' are stale by 10 seconds (threshold=3700 seconds).  I'm forcing an immediate check of the service.
[1185895335] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;emperor20.cs.wisc.edu;what;0;OK: Script ran.
[1185895343] SERVICE ALERT: emperor20.cs.wisc.edu;what;CRITICAL;HARD;1;CRITICAL: Test failed.  Passive check didn't send info.

It looks like, once the stale condition is noticed, it about takes 10 
seconds to run the alternate active/fail check.  If a passive check comes 
through in that time setting the state to OK, the fail check overrides it.

Is there a way to make the forced check verify that a check hasn't come 
through in the meantime?  Or to put a semaphore on the check so that the 
new passive check isn't processed until the forced check completes?



--
Michelle

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/




More information about the Developers mailing list