Host retry interval

Kyle Tucker kylet at panix.com
Mon May 22 21:45:11 CEST 2006


Thanks Eli,

According to the docs, "Nagios checks the status of a host is when a service 
check results in a non-OK status.". Is that when it reaches a HARD state 
after all iterations of max_check_attempts are done or as soon as it goes to 
non-OK (SOFT) state? If the latter, which seems to be when it's kicking off 
host checks, I don't see how increasing the service checks will help. 

> I'd suggest NOT taking this action at the host level; reason being that all
> service checks are halted for the duration of this non-parallelized
> action... You want to avoid doing a host check until absolutely necessary.
> Suggest increasing the max_check_attempts on the SERVICE to a larger number,
> this will avoid impacting the monitoring system as a whole.
> 
> /eli
> 
> 
> On 5/22/06 11:45 AM, "Kyle Tucker" <kylet at panix.com> wrote:
> 
> > Hi,
> > I have many hosts that are constantly giving me DOWN/UP state as they
> > are unreachable for certain periods. In an attempt to give the system more
> > time to become available, I increased the max_check_attempts from 2 to 5. At
> > 2 the interval between retry attempts was 10 seconds. Now at 5, the interval
> > is 7 seconds. I'd like to have this interval higher for some hosts, but
> > there's a real scary note on the hosts check_interval option to not use it
> > if you can help it. Is this my only option or is there a better way? I am
> > also following the thread titled "Workaround for 'Host DOWN' false-positives"
> > but I don't clearly see how to set that up. Here's some output for a failed
> > host (I tail and parse the date field in a script so it's readable).
> > 
> > Mon May 22 14:25:52 2006  HOST ALERT: badhost;DOWN;SOFT;1;* system down * -
> > snmpd not responding
> > Mon May 22 14:25:59 2006  HOST ALERT: badhost;DOWN;SOFT;2;* system down * -
> > snmpd not responding
> > Mon May 22 14:26:06 2006  HOST ALERT: badhost;DOWN;SOFT;3;* system down * -
> > snmpd not responding
> > Mon May 22 14:26:16 2006  HOST ALERT: badhost;DOWN;SOFT;4;* system down * -
> > snmpd not responding
> > Mon May 22 14:26:23 2006  HOST ALERT: badhost;DOWN;HARD;5;* system down * -
> > snmpd not responding
> > 
> 


-- 
- Kyle 
---------------------------------------------
kylet at panix.com   http://www.panix.com/~kylet    
---------------------------------------------


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list