Notification Timing Problem

Assaf Flatto assaf.flatto at ssp.uk.com
Thu Oct 30 10:27:45 CET 2008


How did you configure the nagios to send the emails? 
is the nagios relaying via a different mail server or is a mail server running on the same machine ?

Also the pager - how are you sending the notifications ? external service or in house ?




On Wednesday 29 October 2008 22:11:55 MJK Nagios wrote:
> Hello,
>
> I seem to be having a rather difficult time getting notifications to
> work the way that I would like.  I'm using a test host to have Nagios
> generate alerts when I take it offline.  Nagios detects and sends
> notifications for the host coming back on-line very quickly.  What I'm
> doing in order to test my config is to then simulate that the host
> fails again in a few minutes.  The problem I'm seeing is that it takes
> Nagios 15-20 minutes to send a notification that the host is again
> down.  This would be useless to me in a production environment; if the
> host drops again -- I need to know about it immediately.
>
> OK, I've been through the docs and have checked everything that seems
> to make sense in order to figure out this issue -- with no success.
> I'm running Nagios 3.0.2.  Please see some output I've included below
> to see the time lag between when Nagios notices the host is down again
> and when it sends the notification.
>
> Thanks!
>
> -Matt
>
> ---------------------------------------------------------------------------
>---
>
>
> Here's my notification entries while testing:
> ---------------------------------------------------------------------------
>--- Host  Service  Type  Time  Contact  Notification Command  Information
> TEST-01  N/A  HOST UP  10-29-2008 17:04:05  NOC  notify-host-by-email PING
> OK - Packet loss = 73%, RTA = 0.70 ms
> TEST-01  N/A  HOST UP  10-29-2008 17:04:05  NOC  notify-host-by-pager
> PING OK - Packet loss = 73%, RTA = 0.70 ms
> TEST-01  N/A  HOST UP  10-29-2008 17:04:05  support
> notify-host-by-email  PING OK - Packet loss = 73%, RTA = 0.70 ms
> TEST-01  N/A  HOST UP  10-29-2008 17:04:05  support
> notify-host-by-pager  PING OK - Packet loss = 73%, RTA = 0.70 ms
> TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  NOC
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  NOC
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  support
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 17:03:25  support
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  NOC
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  NOC
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  support
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:58:05  support
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST UP  10-29-2008 16:40:10  NOC  notify-host-by-email
> PING OK - Packet loss = 0%, RTA = 4.47 ms
> TEST-01  N/A  HOST UP  10-29-2008 16:40:10  NOC  notify-host-by-pager
> PING OK - Packet loss = 0%, RTA = 4.47 ms
> TEST-01  N/A  HOST UP  10-29-2008 16:40:10  support
> notify-host-by-email  PING OK - Packet loss = 0%, RTA = 4.47 ms
> TEST-01  N/A  HOST UP  10-29-2008 16:40:10  support
> notify-host-by-pager  PING OK - Packet loss = 0%, RTA = 4.47 ms
> TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  NOC
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  NOC
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  support
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:27:40  support
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST UP  10-29-2008 16:10:20  NOC  notify-host-by-email
> PING OK - Packet loss = 0%, RTA = 0.45 ms
> TEST-01  N/A  HOST UP  10-29-2008 16:10:20  NOC  notify-host-by-pager
> PING OK - Packet loss = 0%, RTA = 0.45 ms
> TEST-01  N/A  HOST UP  10-29-2008 16:10:20  support
> notify-host-by-email  PING OK - Packet loss = 0%, RTA = 0.45 ms
> TEST-01  N/A  HOST UP  10-29-2008 16:10:20  support
> notify-host-by-pager  PING OK - Packet loss = 0%, RTA = 0.45 ms
> TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  NOC
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  NOC
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  support
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 16:05:48  support
> notify-host-by-pager  (Host Check Timed Out)
> TEST-01  N/A  HOST UP  10-29-2008 15:44:19  NOC  notify-host-by-email
> PING OK - Packet loss = 0%, RTA = 0.53 ms
> TEST-01  N/A  HOST UP  10-29-2008 15:44:19  support
> notify-host-by-email  PING OK - Packet loss = 0%, RTA = 0.53 ms
> TEST-01  N/A  HOST DOWN  10-29-2008 15:21:09  NOC
> notify-host-by-email  (Host Check Timed Out)
> TEST-01  N/A  HOST DOWN  10-29-2008 15:21:09  support
> notify-host-by-email  (Host Check Timed Out)
>
> And here's the host's history:
> ---------------------------------------------------------------------------
>--- October 29, 2008 17:00
> Program Start[10-29-2008 17:07:34] Nagios 3.0.2 starting... (PID=1917)
> Program Restart[10-29-2008 17:07:34] Caught SIGHUP, restarting...
> Service Ok[10-29-2008 17:04:25] SERVICE ALERT:
> TEST-01;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.44 ms
> Service Ok[10-29-2008 17:04:15] SERVICE ALERT: TEST-01;TFTP
> Server;OK;HARD;1;TCP OK - 0.007 second response time on port 8099
> Service Ok[10-29-2008 17:04:15] SERVICE ALERT:
> TEST-01;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.022
> seconds
> Host Up[10-29-2008 17:04:05] HOST ALERT: TEST-01;UP;HARD;1;PING OK -
> Packet loss = 73%, RTA = 0.70 ms
>  October 29, 2008 16:00
> Host Down[10-29-2008 16:58:05] HOST ALERT: TEST-01;DOWN;HARD;10;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:56:25] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:54:55] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:53:15] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:51:35] HOST ALERT: TEST-01;DOWN;SOFT;6;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:49:55] HOST ALERT: TEST-01;DOWN;SOFT;5;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:48:15] HOST ALERT: TEST-01;DOWN;SOFT;4;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:46:45] HOST ALERT: TEST-01;DOWN;SOFT;3;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:45:35] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:45:05] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
> Check Timed Out)
> Program Start[10-29-2008 16:44:55] Nagios 3.0.2 starting... (PID=1917)
> Program Restart[10-29-2008 16:44:55] Caught SIGHUP, restarting...
> Host Down[10-29-2008 16:43:30] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
> Check Timed Out)
> Service Critical[10-29-2008 16:42:30] SERVICE ALERT:
> TEST-01;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100%
> Service Critical[10-29-2008 16:42:20] SERVICE ALERT: TEST-01;TFTP
> Server;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
> Service Critical[10-29-2008 16:42:20] SERVICE ALERT:
> TEST-01;HTTP;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10
> seconds
> Host Down[10-29-2008 16:41:50] HOST ALERT: TEST-01;DOWN;SOFT;1;(Host
> Check Timed Out)
> Service Critical[10-29-2008 16:41:30] SERVICE ALERT:
> TEST-01;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100%
> Service Critical[10-29-2008 16:41:20] SERVICE ALERT: TEST-01;TFTP
> Server;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
> Service Critical[10-29-2008 16:41:20] SERVICE ALERT:
> TEST-01;HTTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10
> seconds
> Service Ok[10-29-2008 16:40:20] SERVICE ALERT:
> TEST-01;PING;OK;SOFT;1;PING OK - Packet loss = 0%, RTA = 0.56 ms
> Service Ok[10-29-2008 16:40:10] SERVICE ALERT: TEST-01;TFTP
> Server;OK;SOFT;1;TCP OK - 0.047 second response time on port 8099
> Service Ok[10-29-2008 16:40:10] SERVICE ALERT:
> TEST-01;HTTP;OK;SOFT;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.034
> seconds
> Host Up[10-29-2008 16:40:10] HOST ALERT: TEST-01;UP;HARD;1;PING OK -
> Packet loss = 0%, RTA = 4.47 ms
> Program Start[10-29-2008 16:39:40] Nagios 3.0.2 starting... (PID=1917)
> Program Restart[10-29-2008 16:39:40] Caught SIGHUP, restarting...
> Host Down[10-29-2008 16:27:40] HOST ALERT: TEST-01;DOWN;HARD;10;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:26:00] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:24:20] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:22:40] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:21:10] HOST ALERT: TEST-01;DOWN;SOFT;6;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:19:30] HOST ALERT: TEST-01;DOWN;SOFT;5;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:18:00] HOST ALERT: TEST-01;DOWN;SOFT;4;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:16:20] HOST ALERT: TEST-01;DOWN;SOFT;3;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:14:40] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host
> Check Timed Out)
> Service Critical[10-29-2008 16:13:30] SERVICE ALERT:
> TEST-01;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100%
> Service Critical[10-29-2008 16:13:20] SERVICE ALERT: TEST-01;TFTP
> Server;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds
> Service Critical[10-29-2008 16:13:20] SERVICE ALERT:
> TEST-01;HTTP;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10
> seconds
> Host Down[10-29-2008 16:13:10] HOST ALERT: TEST-01;DOWN;SOFT;1;(Host
> Check Timed Out)
> Program Start[10-29-2008 16:12:10] Nagios 3.0.2 starting... (PID=1917)
> Program Restart[10-29-2008 16:12:10] Caught SIGHUP, restarting...
> Service Ok[10-29-2008 16:10:20] SERVICE ALERT:
> TEST-01;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.43 ms
> Host Up[10-29-2008 16:10:20] HOST ALERT: TEST-01;UP;HARD;1;PING OK -
> Packet loss = 0%, RTA = 0.45 ms
> Service Ok[10-29-2008 16:10:10] SERVICE ALERT: TEST-01;TFTP
> Server;OK;HARD;1;TCP OK - 0.005 second response time on port 8099
> Service Ok[10-29-2008 16:10:10] SERVICE ALERT:
> TEST-01;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.013
> seconds
> Program Start[10-29-2008 16:09:10] Nagios 3.0.2 starting... (PID=1917)
> Program Restart[10-29-2008 16:09:10] Caught SIGHUP, restarting...
> Host Down[10-29-2008 16:05:48] HOST ALERT: TEST-01;DOWN;HARD;10;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:04:18] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:02:38] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host
> Check Timed Out)
> Host Down[10-29-2008 16:00:58] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host
> Check Timed Out)
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge Build the coolest Linux based applications with Moblin SDK & win
> great prizes Grand prize is a trip for two to an Open Source event anywhere
> in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
>
> ::: Please include Nagios version, plugin version (-v) and OS when
> ::: reporting any issue. Messages without supporting info will risk being
> ::: sent to /dev/null



-- 
Assaf Flatto
SSP Ops Team
Linux System Administrator
169 Euston Road, London, NW1 2AE





IMPORTANT . this email and the information in it may be confidential, legally
privileged and/or protected by law. It is intended solely for the use of the
person to whom it is addressed. If you are not the intended recipient, please
notify the sender immediately and do not disclose the contents to any other
person, use it for any purpose, or store or copy the information in any medium.
Please also delete all copies of this email and any attachments from your
system.

We cannot guarantee the security or confidentiality of email communications. We
do not accept any liability for losses or damages that you may suffer as a
result of your receipt of this email including but not limited to computer
service or system failure, access delays or interruption, data non-delivery or
mis-delivery, computer viruses or other harmful components.

Copyright in this email and any attachments belong to Select Service Partner UK
Limited. Should you communicate with anyone at Select Service Partner UK Limited by
email, you consent to us monitoring and reading any such correspondence.

Nothing in this email shall be taken or read as suggesting, proposing or
relating to any agreement concerted practice or other practice that could
infringe UK or EC competition legislation.

Select Service Partner UK Limited is a company registered in England and Wales
(company number 05687183) whose registered office is at 1 The Heights, Brooklands, Weybridge. Surrey. KT13 0NY
 
 

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list