Slipping Schedule Queue

Tedman Eng teng at dataway.com
Thu Jul 24 07:35:42 CEST 2003


I had this problem too, and finally solved it by fiddling with the
inter-check delay.

Read this doc and calculate your own inter-check delay using a smaller
'average' instead of using the 's' method.
http://nagios.sourceforge.net/docs/1_0/checkscheduling.html#service_interlea
ving

------- snip ----------
inter_check_delay_method=s

n = Don't use any delay - schedule all service checks to run immediately
(i.e. at the same time!)
d = Use a "dumb" delay of 1 second between service checks
s = Use a "smart" delay calculation to spread service checks out evenly
(default)
x.xx = Use a user-supplied inter-check delay of x.xx seconds

------------------------

The 's'mart method uses a formula that takes into account the total time
needed for the checks and spreads it out evenly amongst all the services.
This works well in an ideal situation, where all the checks run at the same
frequency.  However, you want to weigh the check interval more heavily
towards the most frequent check (in this case, 1 minute).

So for an easy solution, use your shortest check interval as the 'normal'.
Calculate the inter_check_delay using one minute per check for all services.
This will yield

96 / 92^2 = .0101... seconds  * 60 seconds = .625

inter-check-delay = .625

If 96 services need to run, they will all finish in 60 seconds.  One check
runs every .625 seconds. Initially.
After the first run, they will spread out (according to their next scheduled
time interval).
But now, your system can accomodate the need to run them all within the same
minute. Every five minutes, the 5-minute checks and the APAN-checks overlap,
so you'll need to run 96 checks during that minute.

If you add more services, be sure to recalculate and apply the new number.



"Chris Gill" <CGill at NewWorldApps.com> wrote in message
news:48F26311AF9C9943926CA0CD204BC214BB09A8 at nwa-srv-01.newworldapps.com...
> Hi All,
> I've got a problem that seems to resemble some other issues people
> have had here, but I'm not sure exactly what the 'correct' resolution is.
> I've got a Nagios system (P3 500, 128Mb mem) checking 96 services on 43
> hosts. This is a test bed system to make sure Nagios can replace our
> existing system (a custom-job built around What's Up Gold). So far, Nagios
> has been great, but we've got a problem with the scheduling queue
slipping.
> What I mean by this is that items in the scheduling queue wind up having a
> 'next check' date in the past, but have never been checked.
> I think this problem may be related to our use of APAN to do
> graphing (31 of the 96 services). The APAN service checks have a
> "normal_check_interval" of a minute. I have a feeling this is making
Nagios
> run those checks more frequently, and push other checks off. I find this
> strange, since I've set no limit to the number of simultaneous checks that
> can run, and so I'd expect the scheduling queue to be followed. To test
> this, I had been running even *more* APAN checks (74 of 139), and the
queue
> would wind up getting delayed by an hour or more. Strangely, a few of the
> services that wind up behind schedule are the APAN checks, although it's
> largely checks of network services that get delayed.
> This doesn't appear to be a hardware or network load problem, and
> from what I've read in the list archives, this sort of thing can happen
> independent of hardware load.
> What then, is the way to get around this? We'd really like to keep
> running APAN checks on most of our hosts to get longer term trend graphs.
Is
> there a different package that doesn't have this queue-hosing effect?
> Thanks.
>
>  -----------------------------------------
> Christopher P. Gill, Systems Engineer, New World Apps
> cgill at newworldapps.com
> 703-856-7268
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: VM Ware
> With VMware you can run multiple operating systems on a single machine.
> WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
> same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>





-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list