RE host checks configuration question

Marc Powell marc at ena.com
Thu Jan 24 14:55:28 CET 2008


On Jan 24, 2008, at 4:59 AM, Cyrille Bollu wrote:

 > OK, I have find the explanation as what means a "check_interval"  
being set to 0 in the documentation of Nagios 3 (we are using 2.9  
here): 0 means *regular* host checks are not performed. Which is good.

The host check behavior can be different under nagios-3. Since you say  
you're using 2.9, I'll respond in that manner.

> I have another question now: Are on-demand host checks scheduled  
> when a service transits in a *soft* not-OK state or when it transits  
> in a *hard* not-ok state?

Soft. The first non-OK service check results in a host check.

http://nagios.sourceforge.net/docs/2_0/networkreachability.html

>
> This is a quite important question since knowing the answer to this  
> question will influence the value I'm gonna set for the host  
> "max_check_attempts" parameter: When "on-demand host checks" are  
> scheduled when a service transits in a *hard* not-OK state, I will  
> set the host "max_check_attempts" to a lower value than when "on- 
> demand host checks" are scheduled when a service transits in a  
> *hard* not-OK state (since I'm more confident in the service check  
> result).

There are other factors that you need to consider. Under 2.x, all host  
checks are performed serially. While a host check is being performed,  
up to max_check_attempts, all other nagios tasks stop completely. This  
can lead to unexpectedly high latencies on problem networks if you  
don't optimize your host check commands to complete as quickly as  
possible and still be confident the host is down. A single ping with 3  
max check attempts for example. Just enough to be sure the host is  
really down.

> Also, there is something else not very clear in Nagios 3  
> documentation:
>
> (from http://nagios.sourceforge.net/docs/3_0/hostchecks.html)
> "Hosts which have their max_check_attempts value set to 1 can cause  
> serious performance problems. The reason? If Nagios needs to  
> determine their true state using the network reachability logic (to  
> see if they're DOWN or UNREACHABLE), it will have to launch serial  
> checks of all of the host's immediate parents. Just to reiterate,  
> those checks are run serially, rather than in parallel, so it can  
> cause a big performance hit. For this reason, I would recommend that  
> you always use a value greater than 1 for the max_check_attempts  
> directives in your host definitions."
>
> Well, I believe the writer of this documentation but I don't  
> understand why setting this parameter to 1 will serialize the host's  
> parents checks. Can someone explain me this point?
>

I don't believe I can answer this any more specifically at this point.  
I haven't examined the code for 3.x much yet.

--
Marc

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list