[PATCH] - 3.0.3: only send out a service recovery escalation if a service is recovering from a non-OK state listed in the escalation or only 'r' is specified as an escalation option

Max perldork at webwizarddesign.com
Tue Aug 4 11:50:04 CEST 2009


Hi Thomas,

We have a partial patch in place at work.  The aapproach we took was
to copy what is used in the service definitions for notified_on*
fields.  We added

int escalated-on-warning
int escalated_on_critical
int escalated_on_unknown

To the serviceescalation struct.  Our next step is to do what you
mentioned, persist the structures to retention.dat on stop and read
from disk on start as otherwise restarts cause recovery notices for
services to be missed in the following scenarios:
* a service that is actively polled immediately recovers after a nagios restart
* a passive service that is associated with an escalation with no
first_escalation value or a first_escalation value of 1 recovers after
a restart.

For us the two situations rarely occur so I could have the partial
patch in place, and so far for over a week it is doing well :).  I
also tested it with service notification policies that are more normal
than ours, e.g. Having just recovery and critical on in the service
and then critical. Andecovery on in the escalation and that worked as
well.

The current partial patch was also code reviewed by my peers at work
and looks clean to them as well.

So, hoping to do the persistence code this next week and have a full
patch that is stable and tested to release for public review w/in two
weeks.

The issue I am mulling over with this is - should we persist all
service escalations on stop or *just* the ones that have state ... My
thought is only persist escalations structs that have flags set to 1
to save space ..

What do you think?

Max

On 8/4/09, Thomas Guyot-Sionnest <dermoth at aei.ca> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 23/07/09 02:09 PM, Max wrote:
>>
>> Will change the code internally, test at our organization using
>> various service notification state combinations (not just our unique
>> setup) and do another code review before resending the patch ...
>> should have a corrected version within a week ro so.
>>
>> Sorry again for the code noise and long winded explanations.
>
> Thanks you very much for looking into a proper fix!
>
> If you can't figure out a way of getting the information out of the
> current data structure maybe we could add a bitmask for all the states
> encountered during the current/last service event. It means adding this
>  info to the status/retention data files but OTOH we could also export a
> macro for use in notification scripts and eventhandlers.
>
> - --
> Thomas
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFKd9Tp6dZ+Kt5BchYRArvAAJ9gC4o0hubbsNjrMgnJGaE58MCklgCaAvS9
> EY1SWXiR51XllbTwCEOB2F4=
> =sBCy
> -----END PGP SIGNATURE-----
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july




More information about the Developers mailing list