[PATCH] Re: alternative scheduler

Andreas Ericsson ae at op5.se
Fri Dec 3 17:39:56 CET 2010


On 12/03/2010 03:24 PM, Fredrik Thulin wrote:
> On Fri, 2010-12-03 at 14:28 +0100, Andreas Ericsson wrote:
>> ...
>>> I meant to say that N is calculated when the list of checks is
>>> (re)loaded. As I don't even try to have retry_intervals and such, a
>>> steady tick interval works great as long as I can finish initiating
>>> another service check in between ticks.
>>>
>>
>> Ah, right. And initiating a check is quite cheap until they start
>> piling up when the network goes bad, which you sort of avoid by using
>> a constant stream of executing checks, so you always know there'll be
>> constant load on the system you're monitoring from.
> 
> Right, but initiating checks doesn't get more expensive just because the
> checks require more CPU cycles to complete (because of retries). Other
> resources might suffer though - I guess the first one to be depleted
> would be file descriptors.
> 

If you produce more ticks the more checks you run, then it becomes more
expensive per check to run each check. The number of ticks should be
constant and the number of checks to start at each tick should be
variable. Producing the tick has overhead too. So does looping over a
list of checks to run each tick, but I guarantee you that that overhead
smaller than producing a tick.

>> I'm wondering if
>> that doesn't sort of solve the problem in the wrong direction though,
>> since the monitoring system is supposed to serve the other systems and
>> endure the inconveniences it suffers itself as best it can. Sort of.
> 
> Hmm. The goal here is to scale sideways as you put it. To evolve to more
> cores and more schedulers thus reaching higher number of checks possible
> per time unit, per server.
> 
> If a given server can only take on 1000 checks per time unit and you
> typically run it around 900, nothing good will come out of
> retry_interval suddenly trying to get the server to do 1100 checks per
> minute. That is over-subscription and the result is undefined at best.
> 
> I would rather dynamically figure out that I'm very probable to be able
> to run 1000 checks per time unit, and then either
> 
>    * use my current approach of always doing 950 and not having
>      retry_interval and similar, or
>    * do 800 per time unit, and allow retry_interval etc. to push it up to
>      900-1000 but never more
> 

Skipping the retry_interval is retarded at best and moronic at worst. Or
possibly the other way around. If you do that you might as well just make
sure you've always got X checks running and let them complete when they
complete. That's an even simpler way of dropping monitoring precision in
favour of imaginary scalability.

Just so we understand each other here; It's quite cool that you wrote a
scheduler in erlang. I don't speak erlang myself, but I find it inspiring
when people get off their arses and solve a problem rather than moping
about it. However, the precision-regression in your scheduler makes it
clearly unsuitable for real-world monitoring. Its merits for the sake of
the server doing the monitoring leaves food for thought when implementing
a new scheduler, but IMNSHO you've aimed for the secondary goal of not
overloading the monitoring server rather than checking things with the
original precision or better. The fact that you have apparently succeeded
doesn't change the fact that what you've created is somewhat akin to an
airplane that can't fly, but has very comfortable chairs.

>>>> That's still "doing more than you did before", on a system level, so the
>>>> previous implementation must have been buggy somehow. Perhaps erlang
>>>> blocked a few signals when the signal handler was already running, or
>>>> perhaps you didn't start enough checks per tick?
>>>
>>> I agree it is more work for the scheduler, but that is better than
>>> having under-utilized additional CPUs/cores, right?
>>>
>>
>> So long as the net effect is that you can run more checks with it, yes, but
>> an exponential algorithm will always beat a non-exponential one, so with a
>> large enough number of checks you'll run into the reverse situation, where
>> the scheduler eats so much overhead that you no longer have cpu power left
>> to actually run any checks.
> 
> That would be absolutely true for a single scheduler, but think about
> many schedulers!
> 

Many schedulers have even more overhead. If the number of processes trying
to run at the same time is larger than the number of available cpus to run
them on you will have contention and context-switching, all of which adds
to the load. Sorry, but your reasoning is flawed. Multiple schedulers is
one way to make sure the available cpus are saturated, but they will still
cause *more* work to be done per check. The difference with multiple
schedulers is that the number is either constant or the exponent is very,
very small (I'm thinking one scheduler per 10000 checks or something), so
the "converges on infinity" ends up not being a problem for the simple
reason that in order to get anywhere near the trouble-zone one would have
to run more checks than there are IP-addresses in the entire IPv4 span.

> My current PoC has a single scheduler too, with a single list of all
> checks. That worked good enough for the PoC, but I could very easily
> turn it into as many as one scheduler process per core, each having a
> subset of the total set of checks.
> 

Yes, and you could quite easily rewrite Nagios. The real question isn't
"Can I do it?", but "Should I? Do I want to?". So let's get back to the
real problem at hand here, which is that Nagios can't always saturate
the cpus of an smp system very efficiently. You've proposed a solution
for it, which I've sort of almost accepted, although it needs some work
before it can go official. If that fixes the problems here and now, I'd
be a little happier, as would a slew of faceless Nagios users neither
of us will ever meet.

> I'm not sure the ideal is one scheduler per core, but do think the
> architecture should be along those lines. The future looks like it will
> give us many many MANY cores. To be able to utilize them, you must
> ultimately leave the single scheduler situation.
> 
> Even if you offload the scheduler thread of everything else, a single
> CPU won't have time to even loop through a huge enough list of services.
> This last paragraph is a bit academic, I agree ;).
> 

True, in a way, but it doesn't really have to. For super-huge networks
(think google, cia or the entire internet if you will), checking is done
differently.

Nagios isn't suitable to monitor an entire such network (although it could
probably do a good job of checking parts of it). But we don't live in
that world yet. We live in the here and the now, where there's usually
2-32 cores in one server, and we have to do our best to make them work as
much as possible so we can lounge in the sun and drink fizzy drinks with
little umbrellas in them instead of worrying about how the network left
in our care is feeling at the moment.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev




More information about the Developers mailing list