passive check expire race condition
Ethan Galstad
nagios at nagios.org
Sat Oct 20 23:34:27 CEST 2007
Michelle Craft wrote:
> [1185891648] SERVICE ALERT: emperor20.cs.wisc.edu;what;OK;HARD;1;OK: Script ran.
> [1185895333] Warning: The results of service 'what' on host 'emperor20.cs.wisc.edu' are stale by 10 seconds (threshold=3700 seconds). I'm forcing an immediate check of the service.
> [1185895335] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;emperor20.cs.wisc.edu;what;0;OK: Script ran.
> [1185895343] SERVICE ALERT: emperor20.cs.wisc.edu;what;CRITICAL;HARD;1;CRITICAL: Test failed. Passive check didn't send info.
>
> It looks like, once the stale condition is noticed, it about takes 10
> seconds to run the alternate active/fail check. If a passive check comes
> through in that time setting the state to OK, the fail check overrides it.
>
> Is there a way to make the forced check verify that a check hasn't come
> through in the meantime? Or to put a semaphore on the check so that the
> new passive check isn't processed until the forced check completes?
>
> --
> Michelle
>
This has been on my todo list for a while, and its finally done. :-) A
fix was just posted to the HEAD branch of CVS (Nagios 3) that will cause
freshness check results to be ignored if a passive check arrived between
1) the time the service was detected as stale and a check was initiated
and 2) the time the freshness check results are processed.
Ethan Galstad
Nagios Developer
___
Email: nagios at nagios.org
Web: www.nagios.org
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
More information about the Developers
mailing list