managing latency-induced host down alerts

Andreas Ericsson ae at op5.se
Wed Sep 12 17:19:27 CEST 2007


Michael W. Lucas wrote:
> On Wed, Sep 12, 2007 at 10:02:51AM -0500, Marc Powell wrote:
>>
>>> -----Original Message-----
>>> From: nagios-users-bounces at lists.sourceforge.net [mailto:nagios-users-
>>> bounces at lists.sourceforge.net] On Behalf Of Michael W. Lucas
>>> Sent: Wednesday, September 12, 2007 9:46 AM
>>> To: nagios-users at lists.sourceforge.net
>>> Subject: [Nagios-users] managing latency-induced host down alerts
>>>
>>> Hi,
>>>
>>> I'm using Nagios 2.9 on FreeBSD, on a wide area network that has
>>> remote networks scattered across the USA and Mexico.
>>>
>>> We have a problem where latency on some remote circuits rises due to
>>> congestion.  This means that various service checks time out, as they
>>> take more than 10 seconds to complete.  (Yes, this is a real problem,
>>> and we're addressing it.  I'm using smokeping to track latency at
>>> these sites now, analyzing traffic, etc.)
>>
>>> I'd like to separate the latency problem from a site down problem.  I
>>> can think of a couple ways to do this:
>>>
>>> 1) increase the 10-second maximum timeout for a service check to
>>> complete.  Can this be done in Nagios?
>> Yes, and is the route I would take since it's the simplest. All standard
>> plugins support a timeout parameter, usually -t. You can run ./plugin
>> --help to verify if it's supported. Just add an appropriate timeout for
>> the test you're trying to complete in the command{} definition. You'll
>> also need to increase the master service_check_timeout parameter in
>> nagios.cfg. That's a fallback timeout in case the plugin doesn't
>> terminate itself properly. I have my plugin timeouts generally set at 45
>> seconds and the master at 60.
> 
> Hi,
> 
> My understanding was that Nagios terminated service checks after 10
> seconds, no matter how long the plugin took to complete?
> 

By default it is 60 seconds. This limit is imposed only to prevent
plugins in infinite loops from sinking the system entirely (although
2 or 3 such rogue plugins would surely make it unusable long before
60 seconds has passed).

> I have my plugins set to 10 seconds, but when I increase them beyond
> 10 seconds Nagios still reports the maximum time for any check is 10
> seconds.
> 

If that reporting is being done by the "Performance Info" page in Nagios,
it's most likely old numbers. You need to let Nagios run for quite some
time before it has any effect what so ever.

If it is not, you still need to take one or two of the steps suggested by
Marc Powell earlier:
* Add '-t 45' (without single quotes) to practically all of your check
commands (first check which plugin support the flag).
* Increase the max_plugin_timeout in nagios.cfg to something more than
10.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list