Race condition in freshness checking

Ethan Galstad nagios at nagios.org
Thu Sep 27 17:16:29 CEST 2007


Ton Voon wrote:
> Hi!
> 
> We found a bug in the calculation of the latency for a passive check. 
> This has highlighted a possible race condition re: freshness checking. 
> We wanted to get some ideas on what is the best approach to fix this.
> 
> Background:
> 
> We have a master/slave arrangement, with freshness checking 
> (freshness_threshold=0) of slave services on the master.
> 
> Looking in the NDO db, we realised that the latency values for passive 
> results were incorrectly calculate - sometimes latency values could be 
> 1000x out. The patch is attached. However, since using this patch, we've 
> seen occasional race conditions.
> 
> Problem:
> 
> Within checks.c:check_service_result_freshness, if a service has past 
> its expiration_time, it is marked as is_being_freshened and a forced 
> service check is scheduled. However, if a passive result for this 
> service is processed before this forced check is run, then the service 
> is marked as stale and the state is inconsistent between master and slave.
> 
> Possible solutions:
> 
>   - If a check result is processed with is_being_freshened set for the 
> service, then remove forced check from schedule if it exists.
>   - Change is_being_freshened to stale_time (0 if not stale). On running 
> the forced check, if stale_time is less than last_check_time (+ 
> latency?), break out of running the forced check.
> 
> None of these sound particularly appealing to us. Are there other 
> possible solutions? Any opinions?
> 
> Ton
> 

I think this race condition was brought up once before on the list, so 
I'll take a look at what can be done.  I think a reasonable solution can 
be found to work for Nagios 3, but backporting it to Nagios 2 will be 
more challenging due to the different check result IPC.


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/




More information about the Developers mailing list