[PATCH] circular dependency check speedup

Andreas Ericsson ae at op5.se
Thu Dec 16 12:23:36 CET 2010


I'd appreciate if you could send new threads as new messages
instead of as replies to current threads. Especially when you're
sending patches. If I hadn't been interested in the topic listed
in this thread, I'd have missed this patch and it had been dropped
on the floor (unless Ton had applied it, in which case I would
have reverted parts of it quite promptly).

I've done some organoleptic analysis of the patch (looked at it).
Analysis included below.

On 12/15/2010 06:38 PM, Glenn Herteg wrote:
> Nagios has historically used a very inefficient algorithm to check
> for circular host and service dependencies, with respect to both
> execution and notification dependencies. This area has been the
> subject of previous patches to improve performance. For example,
> in early 2008, a depth-first-search algorithm was proposed for
> part of this checking, and it was eventually incorporated into
> the standard code base. However, that only affected part of the
> circular-dependency-check algorithm, and performance against tens
> of thousands of dependencies has still been extremely slow (taking
> tens of minutes for a very large configuration).
> 

Neat. So far so good :)

> GroundWork Open Source has looked at this problem with an eye
> toward full optimization, and hereby provides a patch to fix
> the remaining sources of poor performance. The changes here are
> not just theoretical; they have been tested against a very large
> customer configuration, both with and without seeding a number of
> actual circular dependencies, to demonstrate that the same errors
> are caught as with the original code.

If you could make the test configs available online, that'd be
great. I'd like to test this myself.

> The replacement algorithm in
> this patch provides a speedup of around 20,000x for that portion of
> a pre-flight or reload operation, for a configuration with a very
> large number of host or service dependencies. This translates to
> an overall speedup of about 275x with our large test dataset, for
> a complete preflight operation. In simpler terms, many minutes of
> computation dissolve away into just a few seconds.
> 

I'd like numbers, along with architectures this was tested on. Also
the test-configs that were used to get these numbers. Take a look
at a6e06d8de24ffcb4a8c341a60098042dc3284756 to get an idea of how I
like to see performance numbers. The repo where a6e06d8de24ff is
available can be cloned from git://git.op5.org/nagios.git

> Given the timing improvements just noted, it is GroundWork's belief
> that application of this patch will completely obsolete the nagios
> -x and --dont-verify-paths ("Don't check for circular object paths")
> options.
> 

If what you're claiming is true, they'll be deprecated and will do
nothing, but will remain valid options until Nagios 4.x hits the
market.

> The replacement algorithm works in three ways: by using substitute
> data structures which can be operated on much more efficiently, by
> sidestepping huge amounts of pointless work, and by optimizing the
> placement of fields within certain existing data structures so that
> more efficient assembly code is generated for the most commonly used
> field accesses. All of the modifications have been carefully tested
> using actual measurements to show that they do improve execution
> time, so the field-placement changes (for instance) are not based
> on capricious guesswork.
> 

Hmm. Having taken a look at the field restructures, I'll have to say
that your claim is completely bogus for any architecture where

   sizeof(int) == sizeof(void *)

which holds true for most architectures not running in compatibility
mode. This is ofcourse only true if the compiler is clever enough to
calculate the real address of the desired variable rather than doing
the memory fetch in two steps, which has been the case for gcc since
1.x sometime. If sizeof(int) != sizeof(void *) and the architecture
brings penalties when doing unaligned memory access, the structure
reordering makes sense, but it would make *more* sense to utilize
a padding scheme for the int things to make sure they're always the
same size as a pointer, and to use a sensible compiler that always
does direct offset access.

Either way, I've declared anathema on structure reordering patches
until 3.3 since they break the NEB ABI, so you'll have to reroll
the patch anyways.

> To avoid impacting the rest of the Nagios code outside of the
> circular dependency checks, the original dependency list structures
> are restored after the replacement algorithm has done its checking.
> We have not investigated whether keeping the replacement data
> structures around for later use instead of the original dependency
> lists might afford additional operating speedup, though that is
> perhaps a possibility.
> 
> As provided, the attached patch file is suitable for direct
> application to the Nagios 3.2.3 source code.
> 

Perhaps, but since some changes are not directly related and of
dubious value, I'd like this patch split into several parts which
can be applied and tested separately.

First the patch that alters the algorithm for searching through
dependencies, but using a generic binary search algorithm instead
of the "specialty" ones which, afaiu after a quick glance understand
it are special only because they lack checking for NULL being passed
as an argument. That's a microscopic gain in the larger scheme of
things, but adds enormously to the maintenance burden, so it needs
to be tested separately. If this is what provides the bulk of the
speedup, your algorithm is poorly implemented.

Then the patch that re-orders the object structures. I suspect this
patch has microscopic impact as well and, like I said, I can't take
it now anyways. This should be testable for performance both with
and without patch nr 1.

The third patch should be the one creating all the specialty binary
search algorithm functions (code duplication galore).

With all the patches you're serious about getting into the Nagios
core, I want performance numbers measuring hot and cold cache cases
before and after. Do

   echo 3 > /proc/sys/vm/drop_caches

as root and then run

    /usr/bin/time nagios -p /path/to/nagios.cfg

twice to measure the real difference in startup time. The first one
will be with cold cache, the second with hot cache.

I expect the algorithm change to provide a vast improvement (as it
did for host->parent relationships), with the other patches to be
relatively minor, so the only patch I expect to apply is actually
the first one.

Thanks :)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d




More information about the Developers mailing list