Problem with time between soft down checks

Marc Powell marc at ena.com
Thu Mar 18 15:40:25 CET 2004


Josh Van As <mailto:JVanas at finncorp.com> wrote:
> We just installed Nagios 1.2 as an upgrade to 1.1.   We had the same
> problem I am about to describe in 1.1, and was hoping that 1.2 fixed
> it. 
> It did not.
> 
> Our desired behavior is that when a service or host soft fails, we
> want Nagios to wait 1 minute then re-check.  Repeat this a total of 5
> failed checks (5th one being HARD) before sending out notification.  
> 
> The problem we are having, as you can se from the sample below, is
> that Nagios is only waiting 3 seconds in-between soft fail checks. 
> Instead of a host / service taking 4 minutes to fail 4 additional
> times (before   
> notification) it only takes about 12 seconds.
> 
> We are getting a lot of false pages because just about any network
> glitch can last 12 seconds. 
> 
> Has anyone seen this before?  Can you please help!  We love this
> product, but this is driving us crazy with pages!  Is this a problem
> with our perl installation?  Are we missing a module or something? 
> Or do we have the config files setup wrong?   

The time between service check retries is configurable and you should be
able to do what you want with those. The same cannot be done for host
check retries however. The way nagios is designed is that if the status
of a host is at all in question it must be definitively determined
before any other checks can proceed. In fact, _everything_ else stops
until the status of the host is determined to be in a HARD state and
nagios will rapidly issue the host check_command up to the maximum
retries. This is necessary so that the network reachability and
dependency tests work correctly (and others I'm sure). I'm not sure if
this behavior has been modified in 2.0a. Someone out there may have a
suggestion for a workaround but I think the standard assumption is that
the network transport is good and if it wasn't you'd want to know about
it. I personally can't offer a suggestion other than either modifying
your host_check command to artificially introduce a delay but that will
throw a lot of other checks out of wack or not doing host checks at all
for those specific problem hosts.

--
Marc


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list