Logging API revamp

Andreas Ericsson ae at op5.se
Mon Oct 15 02:17:53 CEST 2007


Ethan Galstad wrote:
> Andreas Ericsson wrote:
>> So, I started looking into revamping the event queue logic, but ended up
>> with a migraine from the cumbersome way logging is done, so I decided to
>> try doing something about it, and the attached 3-patch series is the
>> result from it.
>>
>> It compiles alright, both for nagios and the cgi's. I haven't done much
>> in the way of checking past that though, so testing would be welcome.
>>
>> Given that the patches don't change much in the way of logic, they
>> shouldn't really affect anything in significant way.
>>
> [snip]
> 
> Thanks for the patches - they are excellent ideas.  I'll get them 
> implemented when I get back to the US later this week.
> 

Anytime. I guess the conference spurred some Nagios-hackativity into
me ;-)

> For the event queue, I was thinking that a skip list structure might be 
> best for efficiency (http://en.wikipedia.org/wiki/Skip_list).  The event 
> queue is used in primarily two situations:
> 
> 1. Popping events from the head of the list to be executed
> 2. Inserting events into the list (mid- or endpoint).
> 
> #1 is very efficient with a linked list, but performance with #2 can be 
> quite bad in large lists.  Since a check event usually appears for each 
> host/service that is defined, this can lead to bad performance - O(n^2) 
> I believe - with large installations.  A skip list would bring the 
> performance closer to O(log n).
> 
> Anyone have comments/experiences they care to share about the 
> performance of skip lists and/or better alternatives?
> 

A skiplist would probably do wonders. I've been experimenting with one
now, actually using the timestamp for when next to execute <action> is
the key to the basic element. Using max_normal_check_interval as
num_buckets seems to be the best bet so far, since that would make sure
one has a decent dispersion while keeping the buckets nearly saturated.

It's probably best to make sure the bucket-count is within reasonable
limites, such as 256 and 1024 buckets (17 minutes), and possibly
keeping num_buckets to a power of 2 to avoid modulo operations, which
are quite slow on some CPU's.

Although for performance reasons, I think it'd be better to add the
scheduled event slot to the host/service structs. That way you can
always remove it from the list with some simple pointer-fiddling.
The memory impact might hit large networks fairly badly, but those
should be running on pretty beefy hardware anyways, so 500KiB more
or less won't matter all that much.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list