Skiplists

John Rouillard rouilj+nagiosdev at cs.umb.edu
Tue Oct 16 15:40:11 CEST 2007


As an aside:

  http://www.codeproject.com/csharp/SkipList1.asp

has a very nice description and code snippets.

In message <4713E1D8.4040804 at op5.se>,
Andreas Ericsson writes:
>John P. Rouillard wrote:
>> In message <4713932E.1020904 at op5.se>,
>> Andreas Ericsson writes:
>> 
>>> John P. Rouillard wrote:
>>>> In message <4712B60C.1080602 at op5.se>,
>>>> Andreas Ericsson writes:
>>>>
>>>>> Andreas Ericsson wrote:
>>>>>> Ethan Galstad wrote:

>>>>>>> For the event queue, I was thinking that a skip
>>>>>>> list structure might be best for efficiency
>>>>>>> (http://en.wikipedia.org/wiki/Skip_list).  The
>>>>>>> event queue is used in primarily two
>>>>>>> situations:
>>>>>>> 1. Popping events from the head of the list to
>>>>>>>    be executed
>>>>>>> 2. Inserting events into the list (mid- or
>>>>>>>    endpoint).
>>>>>>>
>>>>>>> #1 is very efficient with a linked list, but
>>>>>>> performance with #2 can be quite bad in large
>>>>>>> lists.  Since a check event usually appears for
>>>>>>> each host/service that is defined, this can
>>>>>>> lead to bad performance - O(n^2) I believe -
>>>>>>> with large installations.  A skip list would
>>>>>>> bring the performance closer to O(log n).
>>>>>>>
>>>>>>> Anyone have comments/experiences they care to
>>>>>>> share about the performance of skip lists
>>>>>>> and/or better alternatives?
>>>>>>>
>>>>>> A skiplist would probably do wonders. I've been
>>>>>> experimenting with one now, actually using the
>>>>>> timestamp for when next to execute <action>
>>>>> Scratch that. I only glossed the skiplist page
>>>>> you linked, and misunderstood completely. I've
>>>>> been experimenting with a hash, except the time
>>>>> used for input is the key. Implementation is
>>>>> trivial and performance improvement should be
>>>>> fairly close to or better than a skip-list.
>>>> Hmm, maybe I am missing something, but if I have
>>>> an element with a known key, I see how a hash
>>>> table pointing into a conventional linked list
>>>> will speed up access. But doesn't case 2 take the
>>>> form:
>>>>
>>>>   insert node X that should run at time t into the
>>>>   linked list
>>>>
>>>> mean a sequential scan of the linked list? How do
>>>> you know what time to hash to find the node before
>>>> time t?
>>>>
>>> It's a sequential scan of a small part of the
>>> linked list, just like any non-perfect
>>> hash. Since perfect hashes aren't available
>>> for this, we'll have to use a non-perfect one.
>>>>   E.G. I want node X inserted at 10:24:33, how do I
>>>>        know if the prior node is at 10:24:32, or 10:14:32?
>>>>
>>> With a 60 buckets in the hash-array, you
>>> wouldn't be able to tell the difference
>>> without comparing the timestamp of the other
>>> scheduled items in the list. With more than
>>> 600 buckets, they'd end up in separate lists.
>>>
>>> However, we won't ever have to walk through
>>> *all* the lists, we just have to find the
>>> right "end" of things.
>>>
>>> Assume for a moment that the highest
>>> normal_check_interval you have for your
>>> services is 15 minutes. 1024 buckets will hold
>>> your 15 minutes, with 2.067 minutes to
>>> spare. Now let's say you want to schedule a
>>> service-check x seconds into the future.
>>>
>>> slot = time(NULL) + x;
>>> if (x > num_buckets) {
>>> 	walk_from_tail(event_array[slot]);
>>> }
>>> else {
>>> 	walk_from_head(event_array[slot]);
>>> }
>> 
>> Ah ok your hash function is the timestamp. So
>> all items scheduled for time X are in the same
>> bucket if the number of buckets exceeds the
>> number of seconds from now till the maximum
>> future time you have in the list.
>> 
>> This works because the hash function preserves
>> the order of the keys rather than randonly
>> distributing them like a normal has table. I.E.
>> 
>> key1 = timestamp1
>> key2 = timestamp2
>> 
>> and
>> 
>>  hash(key1) > hash(key2) if key1 > key2
>> 
>> so walking the buckets is the same as skipping
>> across the linked list in order.
>> 
>>> Since we're not likely to schedule checks several hours away,
>> 
>> I have a few checks that run every couple of
>> hours, so I would just have a larger hash
>> table, or would you need to have a seperate bin
>> that gets the entries that aren't schedulable
>> in the normal hash table, and it filters them
>> in on a regular basis.
>> 
>
>1 out of 3 scenarios, basically.
>
>1. You'd have a normal-sized hash-table and take
>the performance-penalty of sometimes having to
>walk those checks to reach the case where you
>want to schedule another.
>
>2. You'd have a large enough hash-table so the
>solution when adding something is always
>tail->next or a new head.
>
>3. One implements a new event-type which examines
>the last_check_time and check_interval and
>decides if it's time to check now or if it's time
>to re-schedule.
>
>Either of these is easy to implement, with nr two
>being the highest memory-for-speed tradeoff,
>although that one would be possible to implement
>with a one-way linkage of the list (just as nr
>3), so perhaps it'd even out for everything but
>the freakish corner cases, where
>max_check_interval is several days.
>
>Now that I think of it, nr 3 might be the best
>way to go, with a relatively huge list to avoid
>the expensive corner-cases.
>
>>> and we don't have to walk *past* all items
>>> with the same scheduled time as our own event
>>> (since the list is going to be sorted based on
>>> a non-unique index), the most common case will
>>> be that we do only two operations to find our
>>> slot.
>> 
>> Since we can add the new item into the
>> head/tail of the same bucket.
>> 
>
>Precisely.
>
>> If the node you are looking for isn't on that
>> list, you go the node that has the closest key
>> preceeding the one you want and look to see if
>> any of it's N-1...1 pointers are assigned. If
>> so start searching along that linked list. At
>> some point you end up searching along the 0'th
>> linked list to find the node.
>> 
>
>Correct me if I'm wrong, but to me that suggests
>that the initial search is made shorter by a high
>value of N, while there will be fewer levels to
>drop with a low value of N.

Well yes. The higher the levels of N the shorter
the search should be on the N=0 linked
list. However you have more vertical searching to
do. The mapping of nodes to higher N linked lists
is probalistic and not deterministic (as in a hash
function). However you will have more nodes
(higher density) in lower N linked lists than in
higher N linked lists. Because of this, you may
only have to walk 1 or two nodes to get to the
middle of a large linked list (like in a b-tree),
but because of the probablistic nature, tht second
node walk may not drop you precisely in the middle
of the linked list.

It seems there should be some optimal depth N and
some optimal number of nodes n, but because the
nature of a skiplist is probabilistic, there is no
skiplist equivalent to a perfect hash function.

>> I'm not sure how duplicates are handled, but I
>> would require only the first node of a
>> duplicate set of nodes to appear in any linked
>> list except the 0 list. So you always have a
>> sublist search if you have duplicate keys. Do
>> we have duplicate keys in the current linked
>> list code?
>> 
>
> Nope. Or perhaps yes. The current linked list is
> key-less, so it's kinda hard to say ;-)

Ahh.

>>>> On all lists except 0, the nodes that get >< are randomly determined,
>>>> and are recalculated on anything that walks the 0 linked list.
>>>>
>>> Will the performance deteriorate if nothing ever walks that list?
>> 
>> I claim yes, but how badly, I don't know. It's similar to having an
>> imperfect hash and getting more data in a single bucket that you have
>> to search.
>> 
>
>Ah, right. I think we'd want to avoid doing that at pretty much all
>costs. Any known ways around it? Google was remarkably silent on the
>subject.

Not sure. As a fallback you could always do a
period scan every N insertions.

>>> I guess I just don't understand Skiplists
>>> properly.
>> 
>> IIRC skiplists are meant to be a balanced
>> b-tree implemention replacement but without
>> having to do all the rebalancing every time a
>> node is entered. Since new node insertion only
>> affects the nodes on either side of it (unlike
>> in a balanced tree where the whole tree has to
>> be restructured to maintain a constant search
>> time) you can have multiple threads updating
>> the list compared to some other data
>> structures.
>
>Ah, there's that ofcourse, but that'd break down
>in the odd case when one thread removes a link
>after it's been read by another thread, wouldn't
>it? Since there's more than one link to and from
>the object, I imagine removing it isn't an atomic
>operation, while reading it is.

Yup. I guess I should have said inserting new
nodes. So you can have the normal scheduler and
the scheduler for events requested by the web
interface both adding at the same time.

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list