Unable to stop executing checks with 3.2.2

Michael Friedrich michael.friedrich at univie.ac.at
Sun Sep 19 11:53:08 CEST 2010


Hi,

On 2010-09-18 19:57, Ton Voon wrote:
>
> The test_events.c passes with this change. However, it would be best
> if a testcase could be written for the problem this is trying to
> solve, which fails without the patch and passes with the patch. This
> will ensure the problem will continue to get visibility in future.
>
> If you create that testcase, I'd be more than happy to apply.
>    

The patch addresses one of the proposed ways to fix the problem like 
mentioned in the issue - http://tracker.nagios.org/view.php?id=152

"Possible solutions would be to reset run_events to TRUE after removing an
event from the queue, reset run_events to TRUE upon entering each "if"
block, or make the "if" into an "if else"."


The firstly committed patch had nothing to do with setting run_event the 
proper way and resolving the issue. It was a bit screwed and misses some 
logical explanation why it was done this way.

Either way, Stephen's solution will work as it breaks up the state 
machine and makes sure changing the queue also affects the execution 
during the while(1) loop.

Take this event queue

hc1 hc2 sc1 hc3 sc2 hc4 sc3 sc4 sc5

servicechecks for sc2 are disabled.


Looping the events happens like this, following the algorithm of the 
original source:

1/ hc1 triggered, no servicecheck condition matched. run_event==TRUE, 
hc1 gets executed in case not being disabled
2/ hc2 same as hc1, run_event is reset to TRUE at the beginning of the 
loop with low prio event
3/ sc1 will be found during checking for servicecheck event. sc1 
run_event will be TRUE as the checks are not disabled. also it won't get 
removed from the queue. so

event_list_low->event_type==EVENT_HOST_CHECK

will not match afterwards.

4/ hc4 will be executed as it was like the rest above.
5/ sc2 is our special service check, which is being disabled. run_event starts with TRUE, then detection of service check event is true, but servicechecks are disabled.
So run_event changes to FALSE, and the event gets re-eschuled, removed from current event queue. Next is the check for hostcheck event, because it's a same priority "IF" and not next-time "ELSE IF".
In this case, hc4 is already visible to the queue checker, and the condition for a hostcheck event MATCHES, but having run_event on FALSE from before.

So in this case we won't have 6/ with hc4, but this happens all in 5/

hc4 is being checked for run_event==FALSE (which originates from sc2), and heureka, this is FALSE and hc4 as a host check event does not get executed too - but re-scheduled!

===

By taking Stephen's Patch, the mentioned ELSE IF will match whether service OR host check event. Currently it's an AND for matching.
Having one looping for those check events will make sure, they both won't disturb each other in detecting if the are

- disabled
- should be reschuled
- should be run

 From my point of view, the logic can be rewritten too by removing that run_event flag and just putting the execution calls to where run_event was TRUE previously. But I haven't tried this time as there is more testing needed.
I would consider Stephen's patch as the patch for the buggy patch.

Kind regards,
Michael


-- 
DI (FH) Michael Friedrich

Vienna University Computer Center
Universitaetsstrasse 7 A-1010 Vienna, Austria

email: 	michael.friedrich at univie.ac.at
phone: 	+43 1 4277 14359
fax: 	+43 1 4277 14279
web:	http://www.univie.ac.at/zid

Icinga Core&  IDOUtils Developer
http://www.icinga.org


------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev




More information about the Developers mailing list