Notification Timing Problem

MJK Nagios mjknagios at mjk.org
Wed Oct 29 23:11:55 CET 2008


Hello,

I seem to be having a rather difficult time getting notifications to
work the way that I would like.  I'm using a test host to have Nagios
generate alerts when I take it offline.  Nagios detects and sends
notifications for the host coming back on-line very quickly.  What I'm
doing in order to test my config is to then simulate that the host
fails again in a few minutes.  The problem I'm seeing is that it takes
Nagios 15-20 minutes to send a notification that the host is again
down.  This would be useless to me in a production environment; if the
host drops again -- I need to know about it immediately.

OK, I've been through the docs and have checked everything that seems
to make sense in order to figure out this issue -- with no success.
I'm running Nagios 3.0.2.  Please see some output I've included below
to see the time lag between when Nagios notices the host is down again
and when it sends the notification.

Thanks!

-Matt

------------------------------------------------------------------------------


Here's my notification entries while testing:
------------------------------------------------------------------------------
Host  Service  Type  Time  Contact  Notification Command  Information
TEST-01  N/A  HOST UP  10-29-2008 17:04:05  NOC  notify-host-by-email
PING OK - Packet loss = 73%, RTA = 0.70 ms
TEST-01  N/A  HOST UP  10-29-2008 17:04:05  NOC  notify-host-by-pager
PING OK - Packet loss = 73%, RTA = 0.70 ms
TEST-01  N/A  HOST UP  10-29-2008 17:04:05  support
notify-host-by-email  PING OK - Packet loss = 73%, RTA = 0.70 ms
TEST-01  N/A  HOST UP  10-29-2008 17:04:05  support
notify-host-by-pager  PING OK - Packet loss = 73%, RTA = 0.70 ms
TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  NOC
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  NOC
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  support
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  support
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  NOC
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  NOC
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  support
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  support
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST UP  10-29-2008 16:40:10  NOC  notify-host-by-email
PING OK - Packet loss = 0%, RTA = 4.47 ms
TEST-01  N/A  HOST UP  10-29-2008 16:40:10  NOC  notify-host-by-pager
PING OK - Packet loss = 0%, RTA = 4.47 ms
TEST-01  N/A  HOST UP  10-29-2008 16:40:10  support
notify-host-by-email  PING OK - Packet loss = 0%, RTA = 4.47 ms
TEST-01  N/A  HOST UP  10-29-2008 16:40:10  support
notify-host-by-pager  PING OK - Packet loss = 0%, RTA = 4.47 ms
TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  NOC
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  NOC
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  support
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  support
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST UP  10-29-2008 16:10:20  NOC  notify-host-by-email
PING OK - Packet loss = 0%, RTA = 0.45 ms
TEST-01  N/A  HOST UP  10-29-2008 16:10:20  NOC  notify-host-by-pager
PING OK - Packet loss = 0%, RTA = 0.45 ms
TEST-01  N/A  HOST UP  10-29-2008 16:10:20  support
notify-host-by-email  PING OK - Packet loss = 0%, RTA = 0.45 ms
TEST-01  N/A  HOST UP  10-29-2008 16:10:20  support
notify-host-by-pager  PING OK - Packet loss = 0%, RTA = 0.45 ms
TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  NOC
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  NOC
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  support
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  support
notify-host-by-pager  (Host Check Timed Out)
TEST-01  N/A  HOST UP  10-29-2008 15:44:19  NOC  notify-host-by-email
PING OK - Packet loss = 0%, RTA = 0.53 ms
TEST-01  N/A  HOST UP  10-29-2008 15:44:19  support
notify-host-by-email  PING OK - Packet loss = 0%, RTA = 0.53 ms
TEST-01  N/A  HOST DOWN  10-29-2008 15:21:09  NOC
notify-host-by-email  (Host Check Timed Out)
TEST-01  N/A  HOST DOWN  10-29-2008 15:21:09  support
notify-host-by-email  (Host Check Timed Out)

And here's the host's history:
------------------------------------------------------------------------------
 October 29, 2008 17:00
Program Start[10-29-2008 17:07:34] Nagios 3.0.2 starting... (PID=1917)
Program Restart[10-29-2008 17:07:34] Caught SIGHUP, restarting...
Service Ok[10-29-2008 17:04:25] SERVICE ALERT:
TEST-01;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.44 ms
Service Ok[10-29-2008 17:04:15] SERVICE ALERT: TEST-01;TFTP
Server;OK;HARD;1;TCP OK - 0.007 second response time on port 8099
Service Ok[10-29-2008 17:04:15] SERVICE ALERT:
TEST-01;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.022
seconds
Host Up[10-29-2008 17:04:05] HOST ALERT: TEST-01;UP;HARD;1;PING OK -
Packet loss = 73%, RTA = 0.70 ms
 October 29, 2008 16:00
Host Down[10-29-2008 16:58:05] HOST ALERT: TEST-01;DOWN;HARD;10;(Host
Check Timed Out)
Host Down[10-29-2008 16:56:25] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host
Check Timed Out)
Host Down[10-29-2008 16:54:55] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host
Check Timed Out)
Host Down[10-29-2008 16:53:15] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host
Check Timed Out)
Host Down[10-29-2008 16:51:35] HOST ALERT: TEST-01;DOWN;SOFT;6;(Host
Check Timed Out)
Host Down[10-29-2008 16:49:55] HOST ALERT: TEST-01;DOWN;SOFT;5;(Host
Check Timed Out)
Host Down[10-29-2008 16:48:15] HOST ALERT: TEST-01;DOWN;SOFT;4;(Host
Check Timed Out)
Host Down[10-29-2008 16:46:45] HOST ALERT: TEST-01;DOWN;SOFT;3;(Host
Check Timed Out)
Host Down[10-29-2008 16:45:35] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
Check Timed Out)
Host Down[10-29-2008 16:45:05] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
Check Timed Out)
Program Start[10-29-2008 16:44:55] Nagios 3.0.2 starting... (PID=1917)
Program Restart[10-29-2008 16:44:55] Caught SIGHUP, restarting...
Host Down[10-29-2008 16:43:30] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
Check Timed Out)
Service Critical[10-29-2008 16:42:30] SERVICE ALERT:
TEST-01;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100%
Service Critical[10-29-2008 16:42:20] SERVICE ALERT: TEST-01;TFTP
Server;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
Service Critical[10-29-2008 16:42:20] SERVICE ALERT:
TEST-01;HTTP;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10
seconds
Host Down[10-29-2008 16:41:50] HOST ALERT: TEST-01;DOWN;SOFT;1;(Host
Check Timed Out)
Service Critical[10-29-2008 16:41:30] SERVICE ALERT:
TEST-01;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
Service Critical[10-29-2008 16:41:20] SERVICE ALERT: TEST-01;TFTP
Server;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
Service Critical[10-29-2008 16:41:20] SERVICE ALERT:
TEST-01;HTTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10
seconds
Service Ok[10-29-2008 16:40:20] SERVICE ALERT:
TEST-01;PING;OK;SOFT;1;PING OK - Packet loss = 0%, RTA = 0.56 ms
Service Ok[10-29-2008 16:40:10] SERVICE ALERT: TEST-01;TFTP
Server;OK;SOFT;1;TCP OK - 0.047 second response time on port 8099
Service Ok[10-29-2008 16:40:10] SERVICE ALERT:
TEST-01;HTTP;OK;SOFT;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.034
seconds
Host Up[10-29-2008 16:40:10] HOST ALERT: TEST-01;UP;HARD;1;PING OK -
Packet loss = 0%, RTA = 4.47 ms
Program Start[10-29-2008 16:39:40] Nagios 3.0.2 starting... (PID=1917)
Program Restart[10-29-2008 16:39:40] Caught SIGHUP, restarting...
Host Down[10-29-2008 16:27:40] HOST ALERT: TEST-01;DOWN;HARD;10;(Host
Check Timed Out)
Host Down[10-29-2008 16:26:00] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host
Check Timed Out)
Host Down[10-29-2008 16:24:20] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host
Check Timed Out)
Host Down[10-29-2008 16:22:40] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host
Check Timed Out)
Host Down[10-29-2008 16:21:10] HOST ALERT: TEST-01;DOWN;SOFT;6;(Host
Check Timed Out)
Host Down[10-29-2008 16:19:30] HOST ALERT: TEST-01;DOWN;SOFT;5;(Host
Check Timed Out)
Host Down[10-29-2008 16:18:00] HOST ALERT: TEST-01;DOWN;SOFT;4;(Host
Check Timed Out)
Host Down[10-29-2008 16:16:20] HOST ALERT: TEST-01;DOWN;SOFT;3;(Host
Check Timed Out)
Host Down[10-29-2008 16:14:40] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
Check Timed Out)
Service Critical[10-29-2008 16:13:30] SERVICE ALERT:
TEST-01;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100%
Service Critical[10-29-2008 16:13:20] SERVICE ALERT: TEST-01;TFTP
Server;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
Service Critical[10-29-2008 16:13:20] SERVICE ALERT:
TEST-01;HTTP;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10
seconds
Host Down[10-29-2008 16:13:10] HOST ALERT: TEST-01;DOWN;SOFT;1;(Host
Check Timed Out)
Program Start[10-29-2008 16:12:10] Nagios 3.0.2 starting... (PID=1917)
Program Restart[10-29-2008 16:12:10] Caught SIGHUP, restarting...
Service Ok[10-29-2008 16:10:20] SERVICE ALERT:
TEST-01;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.43 ms
Host Up[10-29-2008 16:10:20] HOST ALERT: TEST-01;UP;HARD;1;PING OK -
Packet loss = 0%, RTA = 0.45 ms
Service Ok[10-29-2008 16:10:10] SERVICE ALERT: TEST-01;TFTP
Server;OK;HARD;1;TCP OK - 0.005 second response time on port 8099
Service Ok[10-29-2008 16:10:10] SERVICE ALERT:
TEST-01;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.013
seconds
Program Start[10-29-2008 16:09:10] Nagios 3.0.2 starting... (PID=1917)
Program Restart[10-29-2008 16:09:10] Caught SIGHUP, restarting...
Host Down[10-29-2008 16:05:48] HOST ALERT: TEST-01;DOWN;HARD;10;(Host
Check Timed Out)
Host Down[10-29-2008 16:04:18] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host
Check Timed Out)
Host Down[10-29-2008 16:02:38] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host
Check Timed Out)
Host Down[10-29-2008 16:00:58] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host
Check Timed Out)

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list