[PATCH] Re: alternative scheduler

Fredrik Thulin ft at it.su.se
Fri Dec 3 12:20:36 CET 2010


On Fri, 2010-12-03 at 11:40 +0100, Andreas Ericsson wrote:
> Sorry for the long delay. It seems I was half asleep when I scrolled by
> this mail earlier.

No problem.

> ...
> > What is sched_yield? I can't find that function anywhere in the source
> > code. Feel free to improve the patch - as I've previously said C isn't
> > my game.
> > 
> 
> sched_yield() causes the kernel to check through its scheduling queue and
> see if there are other processes waiting to run. If there are, those other
> processes will run. If not, the current process will continue running.

As I see it, the Nagios scheduler can't afford to miss the opportunity
to start another check, but I'm not going to protest if you prefer 

if (...) {
  shed_yield();
  continue;
}

...
> > and with the tiniest C program that appends results to a file as
> > ocsp_command.
> > 
> 
> Use Nagios' own native perfdata writing instead and use a same-partition
> "mv" command to move the perfdata file to the reaper spool directory.

Thanks for the tip, I'll have a look at that.

> > We should have a beer and talk about scheduling sometime, since we're
> > both in Stockholm (?).
> > 
> 
> I'm in gothenburg. We frequently do developer beer things at our office
> here though, so if you happen to come by, we'll crack open a few :)

Thanks for the invite =).

> > My first scheduler ticked once per second and *BAM* started 30+ checks.
> > 
> > A lot of the times, a significant number of these checks were exactly
> > the same check (but different target hosts), so my theory is they all
> > requested the very same resources around the same millisecond. When I
> > changed the scheduler to start one check every 50 ms instead, I saw that
> > I could start around 25% more checks every second. Other theories are
> > welcome, but that was my observation.
> > 
> 
> The problem is the tick-time. I'm guessing you fired the checks and then
> did sleep(1) (or whatever the erlang equivalent is), but that means you
> lose a couple of milliseconds each second (the time it takes to fire up
> the checks), which will inevitably cause you to drift in the scheduler.
> All such sleep()-alike calls are implemented in the kernel with a TICK
> precision that varies from system to system. Most systems have a 10 usec
> tick-rate, so if you start sleeping at 1.94 seconds and sleep for one
> second you'll end up at 2.94 instead of, as a scheduler would wish, at
> 2.0 when checks are actually scheduled.

No, actually not. Erlang is a soft real time system. My approach was to
ask the Erlang VM to send me a tick every N ms (N = 300s * 1000 / number
of checks). So if N is 50, the VM will signal me once every 50 ms, very
precisely and without any drift.

I then just had to finish starting another check command in =< 49 ms,
and go back to sleep. All handling of check results is done completely
asynchronous to this starting of new checks.

This is all in src/npers_spawner.erl if anyone is interested in the
details.

...
> I'll see about adding something similar to your patch to the scheduler.
> It's a good one in spirit, but the implementation left a little to be
> desired.

Thanks! It would really make my life easier if the patch was in the next
Nagios release Ubuntu ships =).

/Fredrik



------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev




More information about the Developers mailing list