Nagios retries checks too soon.

Paul M Dubuc work at paul.dubuc.org
Fri Jun 10 19:48:03 CEST 2011


Jochen Bern wrote:
> On 06/09/2011 08:14 PM, Paul M. Dubuc wrote:
>> Andreas Ericsson wrote:
>>> I'm not sure. I'm also not sure which behaviour is intended. Arguably, either
>>> is correct and Nagios is doing one of two right things.
>> I'm not sure.  If a test times out and Nagios tries again 10 seconds later
>> instead of the 60 seconds specified, that could cause problems; load related
>> problems when you have many of these tests running and timing out and problems
>> for the system under test not having sufficient time to recover before the
>> next check is done.
>
> True, but *if* someone has the latter kind of problem, I'd expect him to
> keep it in mind while writing the configuration, too. IIRC, the actual
> code adds check_interval/retry_interval to the variable that holds the
> (previous) scheduled check time - i.e., the time when the previous check
> supposedly was *started* (assuming negligible check latency).
> Configuring a retry_interval of one minute for a service whose sustained
> request rate may be *less* than one per minute sounds dubitable to me.
>
> (And I'm a firm nonbeliever in Unix-ish "load" figures, as opposed to
> actual CPU usage etc., but that's a different rant.)
>
> Kind regards,
> 								J. Bern

Thanks for this explanation.  It helps quite a bit. The checks we run 
normally take 5 - 15 seconds to complete, but we allow a much longer 
value for timeout.  I was under the impression that the retry interval 
was only counted from the time the previous check completes and the 
status (which is needed to determine if a retry is necessary) is known. 
  Why is the retry time determined before it's know that one is needed? 
  It looks like checks that have longer timeouts need to have longer 
retry intervals to compensate for the worst case.  That's not intuitive 
to me, but I can live with it.

Paul Dubuc

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev




More information about the Developers mailing list