Service checks and retry check interval

Marc Powell marc at ena.com
Wed Jun 16 23:42:55 CEST 2004


 

________________________________

>From: Tom Valdes [mailto:Tom.Valdes at flamenconetworks.com] 
>Sent: Wednesday, June 16, 2004 2:55 PM
>To: nagios-users at lists.sourceforge.net
>Subject: [Nagios-users] Service checks and retry check interval

> I currently have my normal_check_interval set to 5 minutes

> If a service check is missed, I'd like it to retry 5 
> times before sending a notification and I'd like the 
> retry interval to be 1 minute.  (can it be less?  
> Like 10 seconds?)

>I've tried adding the following to services.cfg

>        max_check_attempts         5
>        normal_check_interval        5
>        retry_check_interval           1

I presume this is for the service definition. Can we see the complete
definition?

> Shouldn't this retry a failed check every minute 
> for 5 tries before sending a notification?

For the service above under normal circumstances, yes. I use 5,5,3 to
delay notifications by ~15 minutes.

> Using a test server, I pull the plug and Nagios 
> catches the 100% ping loss but if I plug it back 
> in as soon as it notices, Nagios emails me right 
> away and doesn't return an Up state for another 
> 5 minutes?

For the service or the host? See below.

> The following is what I receive on the status 
> screen.. It shows a State Type: HARD.. Shouldn't 
> it be in a SOFT state until it completes the 
> max_check_attempts?

> Current Status:   CRITICAL    
> Status Information:FPING CRITICAL - 192.168.100.21 (loss=100.000000% )
> Current Attempt:1/10 

Why is max attempts showing 10 here if it's defined as 5 above? Did you
restart nagios after making the change? Do you have multiple nagios
processing running?

There is a special situation that results when you just 'pull the plug'
on a machine you're monitoring. The service check will of course fail on
the first attempt. Nagios will then attempt to check the status of the
host using the host check_command. It will do this exclusively until
max_check_attempts defined for the host is reached and will not attempt
to recheck the status of the service if the host is determined to be
down or unreachable. At that point nagios will attempt to send a HOST
down notification which may be what you are seeing. Because of this
special situation, your retry_check_interval for the service has no
meaning. AFAIK, nagios just falls back to normal_check_interval until
one or more services on the host recovers (and the host by inference).

--
Marc


-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list