Increasing latency - over the top - a cry for help

Ståle Askerød Johansen s.a.johansen at usit.uio.no
Tue May 20 11:50:36 CEST 2008



Hello, oh thou sweet fountain of problem-solving knowledge :-)

Here at the University of Oslo we are running Natios to monitor
roughly 10k services out of which ~9500 are active. We also monitor
~700 hosts. We are running nagios 3.0.1 on a Dell 2850 with 4 Gb of RAM.
And 4 kernels. We upgraded from 2.9 roughly a month ago.

We have the following problem, and are turning here for help after
fumbling in darkness for some time: The latency of both host checks
and service checks increase over time.

After a stop/start of nagios, we see the following pattern:

1) The service latency starts of at ~2.8 seconds, which we are happy 
with. It increases with about 1 ms per minute, a rough estimate.
2) The nagios process starts off at about 17m resident, shown in "top".
3) The "system" part of cpu usage starts off at ~30%

However.

4) The "system" part of cpu usage increases over a period of approx.
six hours, till it reaches a threshold of some kind at ~290%. At the
same time, the system load increases till four or five.
5) At this point, the latency of both host and service checks will
start increasing much faster, until another stop/start of nagios.
The service latency will reach 160 seconds (!) after ~9 hours.
6) At

The question is what causes this. We started using mrtg for graphing
some time after we noticed a problem, so we are not quite sure when
this started.

Our setup is actually quite simple.

o no flap-detection
o no environment macros
o no dependencies


We have tried the following, with no real effect:

use_large_installation_tweaks=1 (with various sub-tweaks)
playing with the max_concurrent_checks
checkresults on a tmpfs filesystem

So.

We are very grateful for any ideas.

I have gathered some useful data on http://folk.uio.no/staalej/nagios/

-- 
Ståle Johansen, soon in despair.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list