Patch RFC - Nagios 3.2 - permanently remove sleep on run_event == FALSE in main loop (events.c) or conditionally remove using nagios.cfg configuration parameter?

Christoph Maser cmr at financial.com
Sun Nov 1 23:03:18 CET 2009


Am Freitag, den 30.10.2009, 16:32 +0100 schrieb Max:
> Hi,
>
> We have been working on reducing the scheduling skew for Nagios
> service checks through a number of different techniques; yesterday we
> were looking through the main event loop in events.c and saw that when
> an event is encountered that is *NOT* scheduled to run, Nagios sleeps
> the sleep_time amount configured in nagios.cfg with a comment about
> not hogging CPU.
>
> While this certainly can be a useful thing to do for environments with
> less powerful hardware or where performance data intervals are not as
> critical as 'playing nice' is, it adds a lot of scheduling skew to
> Nagios for environments (like ours) that have requirements to get
> performance data into other systems at very regular intervals and if
> nanosleep is used, it actually drives the load up on the system over
> time ( on RHEL 5.1, 5.2, and 5.4 at least).
>
> We commented out that code in our environment yesterday and noticed that:
> * Our latency increase over time decreased significantly
> * System load decreased noticeably as nanosleep is not being called
> thousands of times in a polling cycle (test env has 9000 active
> services on ~ 1400 hosts with ~ 800 not runnable due to service
> dependency rules)
>
> To give real numbers, our latency pre-patch was going from 0 to 12
> seconds within about 10 hours; post patch latency has only increased
> to about 1 second after 14 hours of running on this build.  We measure
> when latency is too high by when our SNMP counter-based check
> intervals increase to the point that we are 10% more than the
> configured interval (e.g. 330 seconds if the interval is 300 seconds)
> as that then causes gaps in the time series data warehouse we send our
> performance data to.
>
> Pre patch load after 12-14 hours was increasing to 7, post patch after
> 14 hours system load has levelled off around 3-4 .. this is on a dual
> quad core intel system with 8 GB RAM.  Service check performance /
> minute is around 2k checks.
>
> So while this was a trivial thing to change, for a larger environment
> it makes a very noticeable difference in performance and we would like
> to contribute it as a performance patch.
>
> So I am thinking that we could conditionally perform that additional
> sleep if use_large_installation_tweaks in nagios.cfg is set to 0
> instead of just removing the code and submit that as our patch.
>
> Thoughts / opinions?
>
> - Max

Isn't that the whole point of the sleep_time config value? You could set
that to 0.01 maybe even to 0. But zero really has the problem that you
basically run a nearly empty infinite loop on smaller systems.
About the nanosleep RHEL issue, do you have some more information on
that? Why does it drive up the load over time?

Chris


financial.com AG

Munich head office/Hauptsitz München: Maria-Probst-Str. 19 | 80939 München | Germany
Frankfurt branch office/Niederlassung Frankfurt: Messeturm | Friedrich-Ebert-Anlage 49 | 60327 Frankfurt | Germany
Management board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. Yann Samson | Matthias Wiederwach
Supervisory board/Aufsichtsrat: Dr. Dr. Ernst zur Linden (chairman/Vorsitzender)
Register court/Handelsregister: Munich – HRB 128 972 | Sales tax ID number/St.Nr.: DE205 370 553

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list