[PATCH] Re: alternative scheduler

Jochen Bern Jochen.Bern at LINworks.de
Thu Dec 2 10:03:09 CET 2010


On 12/01/2010 08:55 PM, Adam Augustine wrote:
> While DNX and mod_gearman do implement that specific functionality,
> they are still subject to the scheduler/reaper bottlenecks. We (the
> institution that started the DNX project) have played around with the
> check scheduling parameters quite a bit over the years and even with
> our best scheduling parameters and DNX actually executing the plugins,
> we still see checks scheduled such that we have a large number of
> checks scheduled to execute in a single second with several seconds
> (3-5) of nothing scheduled to execute between.

Agreed. That's also the reason why I don't use either so far; I don't
have a problem (yet ...) with the short-term scheduling (scheduling "due
now" checks onto executors), but I see unnecessary churn in the mid-term
scheduling (schedule next due time of checks just completed).

Unless I *really* need new glasses, there's only three different kinds
of such rescheduling code in the 3.2.x Nagios core:

1. Reschedule *exactly* check_interval / retry_interval from last due
time (iff check_period allows this) - e.g., base/checks.c::1301ff :

   if(reschedule_check==TRUE)
      next_service_check=(time_t)(temp_service->last_check
         +(temp_service->check_interval*interval_length));
   }

2. Reschedule to the *very first second* permitted by check_period -
e.g., base/checks.c::278ff :

   /* make sure we rescheduled the next service check at a valid time */
   get_next_valid_time(preferred_time,
      &next_valid_time,svc->check_period_ptr);
   [...]
      svc->next_check=next_valid_time;

3. Special (error) cases falling back to some hardcoded "check interval"
(five minutes, one week, ...).

Neither case even *looks* at the list of already-scheduled check
executions around the target time, much less does any smoothing.

(For sake of completeness: A smoothing algorithm IMHO should:
Case 1: *Decrease* next_check for at most a certain percentage of
check_interval/retry_interval, so as to avoid consecutive faults in
freshness checks and performance data processing (in the case of RRDs,
violation of xff);
Case 2: *Increase* next_check so as to stay within the check_period, but
determining a max increment which simultaneously smoothes out the
(potentially MANY) affected checks and avoids pushing the chain of
subsequent processing (retry_interval / max_check_attempts if found
non-OK, running event handlers, ...) *beyond* the valid timeframe is
definitely nontrivial.)

Kind regards,
								J. Bern
-- 
Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/>
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev




More information about the Developers mailing list