false host down alerts

Andreas Koch a.koch at eurodata.de
Wed Jun 16 15:06:20 CEST 2004


How do you check, that the host is down. Please copy in your
checkcommands.cfg the 
# 'check-host-alive' command definition

Andreas


Am Mi, den 16.06.2004 schrieb Martin, Jeremy um 13:34:
> Hi,
> 
>  
> 
> We have several nagios servers doing a total of about 1300 service
> checks and 300 host checks using Nagios 1.2 and Nagios plugins 1.3.1.
> 
>  
> 
> Unfortunately something a little annoying keeps happening, not to
> mention strange:
> 
>  
> 
> Nagios keeps sending HOST DOWN alerts when our hosts are not down. For
> example we do a ping check and HTTP-QA check for a website. Nagios
> will send a HOST DOWN alert, but at the same time, the ping check and
> HTTP check will both be just fine. Nagios will think the host is down
> for quite some time, but it keeps doing the ping and HTTP-QA checks
> anyway despite thinking the host is down. The only way I can make it
> think the host is back up is to totally restart Nagios, then it
> forgets that it thought the host was down (even with
> retain_state_information=1)
> 
>  
> 
> At first this happened to a couple load balanced websites and mail
> servers we had. Now this is happening to several other sites and mail
> servers that are not being load balanced. Every time it says a host is
> down like this, I can SSH into the Nagios server, and ping the exact
> hostname Nagios is using (either the FQDN or the IP depending on what
> Nagios is using in hosts.cfg for the given site), and the ping has no
> problems at all.
> 
>  
> 
> Just to give an example - we often get HOST DOWN warnings for
> "mail.ikea-usa.net" even though our SMTP and ping checks continue to
> be OK long after the "HOST DOWN" alert. We also have this problem with
> https://www.verepay.cc - but I think that's because we have ping
> turned off in our firewall for that site at the moment. Our load
> balanced anti-spam/virus mail servers located at scrubber.gsi-kc.com
> also suffer from this problem but I've never had any troubles pinging
> them. Just throwing out those examples incase anyone notices anything
> particularly wrong with them, since Nagios seems to like those sites
> the best for doing this odd "HOST DOWN" behavior.
> 
>  
> 
> Here's what I'll see in the nagios.log file:
> 
>  
> 
> [1087381552] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;1;Socket
> timeout after 10 seconds
> 
> [1087381562] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;2;Socket
> timeout after 10 seconds
> 
> [1087381572] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;3;Socket
> timeout after 10 seconds
> 
> [1087381582] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;4;Socket
> timeout after 10 seconds
> 
> [1087381592] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;5;Socket
> timeout after 10 seconds
> 
> [1087381602] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;6;Socket
> timeout after 10 seconds
> 
> [1087381612] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;7;Socket
> timeout after 10 seconds
> 
> [1087381622] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;8;Socket
> timeout after 10 seconds
> 
> [1087381632] HOST ALERT: scrubber.gsi-kc.com;DOWN;SOFT;9;Socket
> timeout after 10 seconds
> 
> [1087381642] HOST ALERT: scrubber.gsi-kc.com;DOWN;HARD;10;Socket
> timeout after 10 seconds
> 
>  
> 
> How can that be when I can do this at the same time?
> 
>  
> 
> [root at kgsinm05 var]# ping scrubber.gsi-kc.com
> 
> PINGscrubber.gsi-kc.com (205.247.222.244) 56(84) bytes of data.
> 
> 64 bytes from scrubber.gsi-kc.com (205.247.222.244): icmp_seq=1
> ttl=240 time=28.3 ms
> 
> 64 bytes from scrubber.gsi-kc.com (205.247.222.244): icmp_seq=2
> ttl=240 time=26.8 ms
> 
>  
> 
> Thanks!! 
> 
> Jeremy
-- 



-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list