How to reduce a very high latency number

Tedman Eng teng at dataway.com
Thu May 18 20:25:06 CEST 2006


Try tuning the intercheck_delay_method setting.  This setting determines the
initial spreading out of the checks in the queue  during a fresh start.
Nagios tries to do a good job of this, but if you have some checks spaced at
vastly different intervals, it skews the "flat average" formula used to
calculate the smart setting.

Simple example:
Check_A - every 1 minute
Check_B - every 5 minutes
Total Checks: 2

Nagios would pick an intercheck delay of 1.5 minutes.  It averages the check
times, divides by the total checks.
(average check time) / (total checks)
    ((1+5)/2)        /      2         = 1.5

However, once every 5 minutes, you actually need to run Check_A and Check_B
during the same minute, but Nagios would wait 1.5 minutes between each
check, resulting in .5 minutes of latency for Check_A at best, 2 minutes of
latency for Check_A at worst.

To solve this, recalculate your check intercheck_delay using a manually
calculated formula, substituting the shortest check interval, divided by the
total checks.
(shortest check) / (total checks)
        1        /       2        = .5

Think of intercheck delay as the "gap" that nagios uses between checks as
they are added to the queue.  It won't schedule things before it's time to,
so Check_B will still wait 5 minutes before being put into the check queue.
The only difference is that there'll only be a .5 minute "gap" before
executing Check_A afterwards.


NOTE: If you have some extremely short-interval checks, they can skew the
average in the other direction, so if you use this technique, be aware of
the CPU load implications is causes on your monitoring server.



> -----Original Message-----
> From: Trask [mailto:trasko at gmail.com]
> Sent: Wednesday, May 17, 2006 5:26 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] How to reduce a very high latency number
> 
> 
> > I've noticed we get this problem when there are more than 
> one or two hosts
> > down.  Because Nagios (we use 1.2) does host checks first, 
> and sequentially,
> > a host check timing out can hold up everything else (we 
> have >3000 checks to
> > run every 5 minutes).
> >
> 
> I have no hosts down 95% of the time, including now.  I could see how
> that would be an issue, though.
> 
> I have turned off all logging, state retention, performance data
> handling and backed off all timing parameters to their defaults (or
> even less aggressive timings).  In a separate test, I changed only the
> command_check_interval from -1 (check as often as possible) to 10
> seconds.  Both have had seemingly no effect.  At this point, they 2
> main servers I am looking at have been running for 30 minutes and
> latencies are up to 540 seconds for the "bad" one and 48 sec for the
> other one.
> 
> 
> My next step will be to recompile with the latest nagios and try that.
>  If that doesn't show an improvement, I'll try w/o perlcache.  Lastly,
> I'll try without the embedded perl interpretter at all.
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web 
> services, security?
> Get stuff done quickly with pre-integrated technology to make 
> your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on 
> Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&
dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list