Possible bug in Nagios 2.12?

Andreas Ericsson ae at op5.se
Thu Apr 9 00:17:05 CEST 2009


eponymous alias wrote:
>>> I'm not seeing anywhere that
>>> (event_list_low = event_list_low->next)
>>> unless the event actually runs.
>> That's correct, although the loop is broken out of if:
>> * The check shouldn't be run right now due to global
>> options
>> * The check shouldn't be run right now due to temporary
>> setting
>>
>> However, if the check can't be run immediately > due to too many checks running at that moment,
>> or due to the check not being parallelizable
>> and *any* other check is running, the only
>> sensible thing to do is to sleep 1 second
>> and then try again. This is what Nagios does.
> 
> Ah, no.  The really sensible thing to do would
> be to wait only until all the blocking checks
> are done (either just one of "too many", or
> all other checks in the parallelization case).

How? Using no delay at all between attempts would be
rather devastating, since spinlocks eat CPU like mad.
Nagios doesn't catch SIGCHLD in that thread (and nor
can it, or the reaper process wouldn't know when it
should reap child results).

> Sleeping for a full second regardless of when
> the blocking checks complete can waste time
> between when the next plugin could run and
> when it actually does.  And with enough checks
> introducing these extra arbitrary delays, the
> overall latency for the full set of checks can
> easily creep up.
> 

So sleep some less then, but I'm not sure what you're hoping to
achieve by doing that since you'd be decreasing the maximum
latency of a single check by slightly less than one second, and
less than 0.5 second on average.

Hardly worth bothering with imo, unless none of your checks are
parallelizable or you've managed to horribly misconfigure your
max_parallel_service_checks (or whatever it's called).

> Whether it would be simple to make that happen
> in a particular software architecture is a
> separate discussion; I'm just pointing out
> the design issue here.
> 

There's no real design issue. Nagios could sleep a little less,
but it would provide such a microscopic correction for the average
service that it really isn't worth it.

And I thought I was the tired on right now...

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com




More information about the Developers mailing list