Nagios 30 seconds initial delay

Andreas Ericsson ae at op5.se
Thu Jan 28 15:08:24 CET 2010


On 01/28/2010 10:01 AM, Ton Voon wrote:
> 
> On 17 Dec 2009, at 08:12, Brandino Andreas wrote:
> 
>> I have much less hosts and services (for the moment).
>> After deleting "retention.dat" I still face the same delay...
>>
> 
> When did this slow down occur? I can see how Jonathan's patches speed
> up the system but these look like long-term bottlenecks, not a 3.0 ->
> 3.2 migration. I'm interested to know when the slowdown occurred and
> if it is the side effect of something else.
> 
> There was a patch applied sometime in 3.1 which improved the circular
> parents lookup, but I don't know much else (Andreas applied it but
> didn't update the Changelog).
>

I don't do changelogs :p

The info is in the commit history though. I think you're referring to
the following patch, sent in by Jean Gabès:

commit a6e06d8de24ffcb4a8c341a60098042dc3284756
Author: Andreas Ericsson <ae at op5.se>
Date:   Sun May 17 12:54:28 2009 +0000

    Revamp hosts' circular parent/child detection
    
    With this patch, all hosts are checked at most twice when
    determining circular host/parent relationship chains.
    
    Previously, all hosts and all their parents were always checked
    once for each host, which caused for the non-linear scaling
    function represtented as O((n*n)+depth_of_n).
    
    For some timing comparisons:
    A configuration with 151109 nodes is used. The 150300 interesting
    nodes are in chains 300 levels deep, with 501 hosts in each level.
    
    Patched:
    6.28user 0.19system 0:06.58elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+65723minor)pagefaults 0swaps
    
    Unpatched:
    (Ctrl-C'd out as it wasn't done after nearly an hour)
    3221.77user 0.31system 53:56.51elapsed 99%CPU (0text+0data 0max)k
    0inputs+0outputs (0major+67480minor)pagefaults 0swaps
    
    Without even completing the timing comparison, the patch provides
    a speedup of more than 51300%.
    
    On a smaller and more real-worldy configuration of roughly 15000
    hosts, we get these timings:
    
    Patched:
    0.71user 0.02system 0:00.75elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+8219minor)pagefaults 0swaps
    
    Unpatched:
    213.77user 0.02system 3:34.34elapsed 99%CPU (0text+0data 0max)k
    0inputs+0outputs (0major+8280minor)pagefaults 0swaps
    
    A huge improvement indeed, and one that's necessary for Nagios to
    be usable in truly huge networks.
    
    Signed-off-by: Andreas Ericsson <ae at op5.se>
-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list