Hosts report 'DOWN, HARD' after first attempt.

Jonathan Call jcall at verio.net
Fri Jan 16 17:42:39 CET 2009


I am running a distributed monitoring system using Nagios 2.11 on
FreeBSD 6.3. I use NSCA to send host and services events to the central
server from the slave servers and have always had the following problem:

A distributed server notices a host service is "non-Ok" and fires off
check-host-alive. I have it set up to do check_ICMP and so it fires off
five ICMP packets. Since the network isn't always perfect those five
packets get dropped. However, I have my max_retry_interval set to 3 so
it fires off another check_ICMP which completes just fine. As a result I
see the following events take place on the slave server:

[01-16-2009 15:18:46] HOST ALERT: s3200.blah.net;UP;SOFT;2;OK -
10.XX.XX.XX: rta 100.294ms, lost 0%
[01-16-2009 15:18:46] HOST ALERT: s3200.blah.net;DOWN;SOFT;1;CRITICAL -
10.XX.XX.XX: rta nan, lost 100%

However on the central server I see the following:

[01-16-2009 15:19:02] HOST NOTIFICATION:
NOC-email;s3200.blah.net;UP;host-notify-by-email;OK - 10.XX.XX.XX: rta
100.294ms, lost 0%
 [01-16-2009 15:19:01] HOST ALERT: s3200.blah.net;UP;HARD;1;OK -
10.XX.XX.XX: rta 100.294ms, lost 0%
[01-16-2009 15:19:01] HOST NOTIFICATION:
NOC-email;s3200.blah.net;DOWN;host-notify-by-email;CRITICAL -
10.XX.XX.XX: rta nan, lost 100%
[01-16-2009 15:19:01] HOST ALERT: s3200.blah.net;DOWN;HARD;1;CRITICAL -
10.XX.XX.XX: rta nan, lost 100%

The central server is immediately flagging the host as DOWN, HARD in
spite of having the same max_retry_interval = 3 setting. On some hosts
this is generating a tone of false "HOST DOWN" notifications. Is there
any way to fix it?

Jonathan Call




This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free.  Thank you.

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list