Sequence of Service versus Flapping checks/notifications

Ethan Galstad nagios at nagios.org
Fri Oct 5 17:40:20 CEST 2007


Matthew Richardson wrote:
> I have spotted today a couple of cases where a service notification has
> been received immediately followed by a FLAPPINGSTART notification when
> using 3.0b4.  These struck me as being not quite what one might logically
> expect.
> 
> For example:-
> 
> |[03-10-2007 18:01:24] SERVICE NOTIFICATION: smstest1;walkers_smtp-ky;smtp;FLAPPINGSTART (OK);notify-service-by-email;SMTP OK - 6.036 sec. response time
> |[03-10-2007 18:01:24] SERVICE FLAPPING ALERT: walkers_smtp-ky;smtp;STARTED; Service appears to have started flapping (23.0% change >= 20.0% threshold)
> |[03-10-2007 18:01:23] SERVICE NOTIFICATION: smstest1;walkers_smtp-ky;smtp;OK;notify-service-by-email;SMTP OK - 6.036 sec. response time
> |[03-10-2007 18:01:23] SERVICE ALERT: walkers_smtp-ky;smtp;OK;HARD;3;SMTP OK - 6.036 sec. response time
> 
> |[03-10-2007 19:03:34] SERVICE NOTIFICATION: smstest1;jtc_rich-jtc01;ospf_jsy-qr-jtc01;FLAPPINGSTART (OK);notify-service-by-email;OSPF OK - Full adjacency
> |[03-10-2007 19:03:34] SERVICE FLAPPING ALERT: jtc_rich-jtc01;ospf_jsy-qr-jtc01;STARTED; Service appears to have started flapping (23.9% change >= 20.0% threshold)
> |[03-10-2007 19:03:33] SERVICE NOTIFICATION: smstest1;jtc_rich-jtc01;ospf_jsy-qr-jtc01;OK;notify-service-by-email;OSPF OK - Full adjacency
> |[03-10-2007 19:03:33] SERVICE ALERT: jtc_rich-jtc01;ospf_jsy-qr-jtc01;OK;HARD;3;OSPF OK - Full adjacency
> 
>>From what I can see, this seems to occur only when a service moves from a
> HARD non-OK state into an OK state at the same time as the flapping
> threshold is reached.  I have not noticed any when transition from an OK
> state to non-OK.
> 
> It occurs to me that it might be preferable to turn the logic around such
> that the flapping checks and notifications are done prior to reporting any
> hard change of service state.  If so, then only the FLAPPINGSTART
> notifications would be issued in each of the examples above.
> 
> Best wishes,
> Matthew
> 

This situation is certainly not optimal.  It looks like this could also 
occur when moving from OK to non-OK states.  I'll post a patch to CVS 
shortly.


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list