strange behavior with multiple failing hosts and nagios 1.3 / 2.1

Ludwig Pummer Ludwig.Pummer at Copart.Com
Mon Apr 10 18:24:20 CEST 2006


________________________________

	From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-admin at lists.sourceforge.net] On Behalf Of Christian Lyra
	Sent: Friday, April 07, 2006 5:11 PM
	To: nagios-users at lists.sourceforge.net
	Subject: [Nagios-users] strange behavior with multiple failing hosts and nagios 1.3 / 2.1
	
	
	Hi there,
	
	I was evaluating nagios and found a strange behavior on my test setup. After a fresh install, I did a minimal setup, just one contactgroup with one contact. A hostgroup with 4 hosts (no parent relationship). Since I´m only interested to know if a host is up or down  I just configured a check_ping service for each host. As I said, a pretty simple setup. The services is schedulled to run every minute with a one try only. 
	
	To simulate a network problem, I just did a "iptables -A INPUT -p icmp -j DROP". I was expecting that I would see all hosts/services down within a minute, as nagios use to "spread" the checks within the one minute (default configuration). To my suprise I saw just one host coming down on one minute, with the subsequent hosts coming down each minute after that. I mean, host 1 comes down on, say, 8:40:13, host 2 on 8:41:05, host 3 on 8:42:05 and host 5 on host 8:43:05.  I saw the last host come down almost 4 minutes after the "network problem". 
	
	My first try was with nagios 1.3, but the I could reproduce the same problem with nagios 2.1. When I asked a friend to do the same test, he got the same results. A little worst, since he does not check the hosts/services every minute, so he got a host down per 3 minutes, after 10 minutes he couldnt see all the hosts down. 
	
	To my surprise, all the hosts come up about the same time after removing the iptables rule. I could not find a explanation for this behavior, and couldnt find anything wrong with the configuration. I´m not sure if this is a feature, or if I hit a bug. A serious bug to be true. 
	
	I did a minimal search on the mailing list archives and forums, so excuse me if this is know issue, and plz point me where I can find more about it.
	
	
	Christian Lyra

This is unfortunately a long-standing deficiency in Nagios. It suspends all parallel checking while it performs the host check. The more downed hosts you have, the farther behind it falls on the rest of your service checks.
 

--
Ludwig Pummer
System Administrator
Copart Auto Auctions



 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060410/f4d0579b/attachment.html>


More information about the Users mailing list