Fix for host dependency checks

Holger Weiss holger at CIS.FU-Berlin.DE
Wed Mar 22 03:30:35 CET 2006


* Ethan Galstad <nagios at nagios.org> [2006-03-21 19:17]:
> On 22 Mar 2006 at 1:48, Holger Weiss wrote:
> > * Ethan Galstad <nagios at nagios.org> [2006-03-21 12:50]:
> > > I'll keep this on the TODO list for Nagios 3.x, but I think it might
> > > require some more thought.  The last hard state of the host should
> > > only be used in the dependency logic if a state change occurred
> > > relatively recently.  If, for example, the last hard state change
> > > occurred two days ago, you don't want that value used in the logic.
> >
> > Okay, but the current Nagios code uses _only_ the last hard state (no
> > matter how "old" it is), which is the reason why I've encountered the
> > problem in the first place.  I thought about checking the freshness of
> > the last hard state myself (the information is available in the host
> > struct already, so this would be easy), but then I omitted that since
> > letting the dependency fail if either the current or the last hard
> > state matches the criteria seemed sufficiently safe to me.  This way,
> > "false alarms" for the (dependent) host B should reliably be
> > prevented, while the risk of suppressing legitimate notifications for
> > B because the dependency fails due to an outdated last hard state of A
> > is the same as with the current Nagios code.  I believe that in
> > practice, this risk is very low: I suppose that in almost all cases,
> > the configured dependency criteria will be a down and/or unreachable
> > state.  So the risk would be that an outdated down or unreachable
> > state lets the dependency fail, but down and unreachable states should
> > normally be more or less up-to-date.
>
> Aha - I think we're using different terms. :-)  The nagios 2.x code
> uses host->current_state in the dependency logic, but that's not
> necessarily "current" in terms of time.

Yes, that's what I meant.  The 2.x code simply uses host->current_state.
My patch forces a new check of host A during the dependency check for B.
After this new host check was performed, the host->current_state value
used by the 2.x code is available as host->last_hard_state.  My patch
then checks this host->last_hard_state value just as the 2.x code does
and additionally checks the now updated host->current_state.

> I made some major overhauls to the host check logic in the Nagios 3.x
> CVS code.

Ah, sorry, I must admit that I didn't find the time to look at the new
code yet---I'll do that really soon now[tm]! :-)  Okay, forget about my
patch (apart from maybe as a bugfix for the 2.x branch) ;-)

> Those changes include parallel host checks and "predictive dependency
> checks".  The predictive checks idea came from your earlier suggestion
> that all hosts that are depended upon for notification be checked
> before the notification gets sent out.
>
> Here's how the Nagios 3.x code does this... On the second to the last
> max host check attempt, Nagios will execute a parallel check of all
> hosts that are being depended upon.  In Nagios 3.x, host checks are
> no longer performed immediately after each other, but at a
> retry_interval, just as services are re-checked.  That means that
> theoretically all hosts that are being depended upon will have been
> checked before the dependency logic is tested and a decision to
> notify is made.

Having a retry_interval and parallel host checks sounds very, very nice!
I'm looking forward to testing the new code.

Thanks a lot, Holger

-- 
PGP fingerprint:  F1F0 9071 8084 A426 DD59  9839 59D3 F3A1 B8B5 D3DE


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642




More information about the Developers mailing list