Logging API revamp
Andreas Ericsson
ae at op5.se
Mon Oct 15 02:17:53 CEST 2007
Ethan Galstad wrote:
> Andreas Ericsson wrote:
>> So, I started looking into revamping the event queue logic, but ended up
>> with a migraine from the cumbersome way logging is done, so I decided to
>> try doing something about it, and the attached 3-patch series is the
>> result from it.
>>
>> It compiles alright, both for nagios and the cgi's. I haven't done much
>> in the way of checking past that though, so testing would be welcome.
>>
>> Given that the patches don't change much in the way of logic, they
>> shouldn't really affect anything in significant way.
>>
> [snip]
>
> Thanks for the patches - they are excellent ideas. I'll get them
> implemented when I get back to the US later this week.
>
Anytime. I guess the conference spurred some Nagios-hackativity into
me ;-)
> For the event queue, I was thinking that a skip list structure might be
> best for efficiency (http://en.wikipedia.org/wiki/Skip_list). The event
> queue is used in primarily two situations:
>
> 1. Popping events from the head of the list to be executed
> 2. Inserting events into the list (mid- or endpoint).
>
> #1 is very efficient with a linked list, but performance with #2 can be
> quite bad in large lists. Since a check event usually appears for each
> host/service that is defined, this can lead to bad performance - O(n^2)
> I believe - with large installations. A skip list would bring the
> performance closer to O(log n).
>
> Anyone have comments/experiences they care to share about the
> performance of skip lists and/or better alternatives?
>
A skiplist would probably do wonders. I've been experimenting with one
now, actually using the timestamp for when next to execute <action> is
the key to the basic element. Using max_normal_check_interval as
num_buckets seems to be the best bet so far, since that would make sure
one has a decent dispersion while keeping the buckets nearly saturated.
It's probably best to make sure the bucket-count is within reasonable
limites, such as 256 and 1024 buckets (17 minutes), and possibly
keeping num_buckets to a power of 2 to avoid modulo operations, which
are quite slow on some CPU's.
Although for performance reasons, I think it'd be better to add the
scheduled event slot to the host/service structs. That way you can
always remove it from the list with some simple pointer-fiddling.
The memory impact might hit large networks fairly badly, but those
should be running on pretty beefy hardware anyways, so 500KiB more
or less won't matter all that much.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
More information about the Developers
mailing list