NAGIOS Notification Anomaly: Missing HOST UP Notification

Traiano Welcome twelcome at tenet.ac.za
Thu Sep 18 13:07:44 CEST 2008


Hi All

We're running nagios2 (Nagios 2.9) on an Ubuntu server for around 150 servers and have not had any major problems with our installation. A day ago we found that NAGIOS had failed to execute an email notification upon detecting that a particular host had come up from being in a DOWN state: In previous hours, the host was marked DOWN,UP then DOWN again and notifications were sent as usual, but when the host came UP again, it was marked as UP on the nagios web-dashboard, and the logs show a "HOST ALERT .... UP" line, but no email notification was sent out as usual. There is no indication in the logs as to why NAGIOS failed to notify this time, and it is the only 1 out of thousands of notifications which have failed - an anomaly.

It would seem for some reason the notify-by-email plugin may have failed, although only this once, for this site. Or, nagios may have failed to call the plugin for some reason when the site was detected as up. No changes were made to the NAGIOS config during the day.

Below is a small snippet from the logs which show the sequence of events and the missing UP notification from NAGIOS at the end. Would there be any obvious reason why nagios would fail (occassionaly) to send a notification email?


---
Sep 17 19:23:19 sidewind nagios2: HOST ALERT: SITE-AE;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%
Sep 17 19:23:29 sidewind nagios2: HOST ALERT: SITE-AE;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
Sep 17 19:23:39 sidewind nagios2: HOST ALERT: SITE-AE;DOWN;SOFT;3;PING CRITICAL - Packet loss = 100%

Sep 17 19:23:49 sidewind nagios2: HOST ALERT: SITE-AE;DOWN;HARD;4;PING CRITICAL - Packet loss = 100%
Sep 17 19:23:50 sidewind nagios2: HOST NOTIFICATION: NOC-GROUP;SITE-AE;DOWN;host-notify-by-email;PING CRITICAL - Packet loss = 100%

Sep 17 19:23:50 sidewind nagios2: HOST FLAPPING ALERT: SITE-AE;STARTED; Host appears to have started flapping (23.2% change > 20.0% threshold)
Sep 17 19:25:05 sidewind nagios2: SERVICE ALERT: SITE-AE;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100%

Sep 17 19:38:09 sidewind nagios2: HOST ALERT: SITE-AE;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 25.01 ms

Sep 17 19:39:50 sidewind nagios2: SERVICE ALERT: SITE-AE;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 25.01 ms
Sep 17 19:39:50 sidewind nagios2: SERVICE FLAPPING ALERT: SITE-AE;PING;STARTED; Service appears to have started flapping (23.0% change >= 20.0% threshold)
Sep 18 00:00:00 sidewind nagios2: CURRENT HOST STATE: SITE-AE;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 25.00 ms
Sep 18 00:00:00 sidewind nagios2: CURRENT SERVICE STATE: SITE-AE;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 24.96 ms
Sep 18 00:24:50 sidewind nagios2: SERVICE FLAPPING ALERT: SITE-AE;PING;STOPPED; Service appears to have stopped flapping (3.8% change < 5.0% threshold)
Sep 18 00:38:09 sidewind nagios2: HOST FLAPPING ALERT: SITE-AE;STOPPED; Host appears to have stopped flapping (0.0% change < 5.0% threshold)
---


I  wonder if it may have something to do  with the "host flapping" and "service flapping" alerts occurring at the time the notification should have been sent out?

Thanks in Advance!
Traiano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080918/11423526/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list