Flap Detection

Scott scott at netspace.net.au
Fri Apr 2 03:17:59 CEST 2004


Hi Guys and girls and any other genders available.

I wouldn't say I am new to Nagios but I seem to have hit a bit of a 
wall in regards to a slight problem with flapping.

I have it set to the default settings in the nagios.cfg file

<snippet>

low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0

</snippet>

This is working fine with the current setup except when I come across a 
scenerio that looks similar to this.

[2004-04-02 10:21:54] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 10:12:54] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 10:06:54] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 10:00:44] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 08:41:35] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 08:23:34] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 06:32:14] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 06:29:14] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 06:26:08] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 06:05:14] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 05:13:24] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 04:55:29] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 04:25:15] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 04:16:25] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 03:31:14] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 03:25:24] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.
[2004-04-02 03:22:24] SERVICE ALERT: 
mx4;RAID_STATUS;UNKNOWN;HARD;5;CHECK_NRPE: No output returned from NRPE 
daemon.
[2004-04-02 03:19:24] SERVICE ALERT: 
mx4;RAID_STATUS;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 
seconds.

As is seen from this log, I am getting a state change every check, this 
continues for some time and therefore I get a notification for every 
one of these checks until Nagios sees the threshold crossed for 
flapping (in this case it would be the very next check or 2). The only 
problem is that I would like to set up something to let me know that 
the service/host has gone into flap detecion territory and therefore 
suspend notifications after notifying the correct parties that it has 
been put into a flap state.

Not sure if this makes sense or not but at present notifications go 
silent and its not until I look at the web gui that I actually know 
that it has gone into a flap state.

Was wondering if anybody else has found this problem and if there is a 
simple solution for it.

I have read the docs and they do say that in flap states NOBODY GETS 
NOTIFIED, which I won't argue with but would like to know that it has 
occured though.

Looking forward to hearing some feedback

---
Scott
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3671 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040402/1ea95ea6/attachment.bin>


More information about the Users mailing list