Nagios retries checks too soon.

Paul M. Dubuc work at paul.dubuc.org
Thu Jun 9 20:14:00 CEST 2011


Andreas Ericsson wrote:
> On 06/09/2011 03:43 PM, Paul M. Dubuc wrote:
>> Running Nagios 3.2.3, here is an example from the log that shows Nagios
>> retrying a failed check after only 10 seconds.  The normal check interval is
>> 7.5 minutes, retry interval is 1 minute, max. check attempts is 3.
>>
>> Note that this test has a timeout of 130 seconds, so it's been running for
>> over 2 minutes when it times out.  Does Nagios do retries sooner when the
>> timeout for a check is longer than the retry interval?  Is the retry interval
>> measured from the time the previous check starts, or from the time it ends?
>>
>
> I'm not sure. I'm also not sure which behaviour is intended. Arguably, either
> is correct and Nagios is doing one of two right things.
>

I'm not sure.  If a test times out and Nagios tries again 10 seconds later 
instead of the 60 seconds specified, that could cause problems; load related 
problems when you have many of these tests running and timing out and problems 
for the system under test not having sufficient time to recover before the 
next check is done.

Paul Dubuc

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev




More information about the Developers mailing list