Nagios active check performance limits

Gaspar, Carson Carson.Gaspar at gs.com
Mon Nov 12 23:13:32 CET 2007


As per my original mail, 13.5 checks / second (with each check being a
dumb sleep .5 secs, return OK C program) on an HP DL385 G1 2 x dual core
opteron 2.4 GHz (so 4 cores total) running RHEL4U4.

We don't use _any_ active checks on our prod system. I was just
generating numbers for my talk so I could do more than wave my hands and
say active checks were "slow", based on my decision to dump them back in
the 1.x days...

Once the USENIX folks get the talk video up, I'll post a link to it
here. And with any luck, I'll get approval to release my source by the
end of the year (I have approval in principal, but I still need to pass
code review).

Now I just need to finish my slides... ;-)

>-----Original Message-----
>From: nagios-devel-bounces at lists.sourceforge.net 
>[mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf 
>Of Ethan Galstad
>Sent: Saturday, November 10, 2007 3:37 PM
>To: Nagios Developers List
>Subject: Re: [Nagios-devel] Nagios active check performance limits
>
>Gaspar, Carson wrote:
>> After sending this I discovered the joy of the new 3.x parameter 
>> use_large_installation_tweaks...
>> 
>> Nagios is now eating all 4 of my processors, as requested. I'm not 
>> sure _why_, as without this setting my server was practically idle...
>
>How many checks are you executing per second to keep a 4 
>processors warm?  Any details on the hardware you provide us 
>as well?  I'm interested to see how performance is on 
>different systems.
>
>BTW, you can also (as of 3.0b6) disable a new 
>"enable_environment_macros" option if you don't reference env. 
>vars to get macro values in your scripts.  Its a huge waste of 
>time for most people.
>
>> 
>> I'm now seeing about 13.5 checks/sec, much closer to what I expected.
>> 
>> Of course passive checks are processed at 1052/sec, at <25% 
>CPU (which 
>> is why I use them!)
>> 
>>> -----Original Message-----
>>> From: nagios-devel-bounces at lists.sourceforge.net
>>> [mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf Of 
>>> Gaspar, Carson
>>> Sent: Friday, November 09, 2007 11:47 AM
>>> To: 'nagios-devel at lists.sourceforge.net'
>>> Subject: SPAM WARNING!: [Nagios-devel] Nagios active check 
>>> performance limits
>>>
>>> I sent this to -users earlier, but it's probably more 
>appropriate for 
>>> -devel.
>>>
>>> FYI I just re-ran my tests with 3.0b6 and am seeing about the same 
>>> results.
>>>
>>> (Disclosure - I'm giving an invited talk at LISA next week 
>on Nagios)
>>>
>>> I'm trying to determine the maximum active check rate that 
>Nagios can 
>>> sustain on a given system. I'm rather surprised that I 
>can't seem to 
>>> get more than about 3 checks per second.
>>> Am I just being dense? (I haven't used anything but passive checks 
>>> for years now, so my active check foo is
>>> stale...)
>>>
>>> Nagios config settings (testing with 2.9, which is what we're using 
>>> in
>>> production):
>>>
>>> service_inter_check_delay_method=0.01 (tried s as well...) 
>>> service_interleave_factor=s
>>> host_inter_check_delay_methos=0.01 (tried s as well) 
>>> max_concurrent_checks=0
>>> service_reaper_frequency=1
>>> sleep_time=0.01
>>>
>>> performance stats show svc execution at 0 / .8 / .459, svc 
>latency at 
>>> 0 /
>>> 10467.99 / 4475.5599 (3 hours after launch, with only 32708 / 50000 
>>> services
>>> checked)
>>>
>>> I never see more than 9 nagios procs at once. The config is large 
>>> (5000 hosts with 10 services each). The service checks are 
>a simple c 
>>> program that sleeps for half a second then returns OK.
>>>
>>> Is this really the best Nagios can do, or am I missing something?
>>>
>>> --
>>> Carson
>
>
>Ethan Galstad
>Nagios Developer
>___
>Email: nagios at nagios.org
>Web:   www.nagios.org
>
>---------------------------------------------------------------
>----------
>This SF.net email is sponsored by: Splunk Inc.
>Still grepping through log files to find problems?  Stop.
>Now Search log events and configuration files using AJAX and a browser.
>Download your FREE copy of Splunk now >> 
>http://get.splunk.com/ _______________________________________________
>Nagios-devel mailing list
>Nagios-devel at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/nagios-devel
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list