How to reduce a very high latency number

Trask trasko at gmail.com
Wed May 17 20:09:16 CEST 2006


I am still butting up against very high latency issues with my Nagios
setup.  I feel like I must be missing something obvious because it
doesn't seem like I have so many services that the servers cannot keep
up.

As can be seen from the data below, the server with the most service
checks has the highest latency (usually in the neighborhood of 700
seconds! -- this is pre-production).  Is my problem really this
simple?  I have a feeling that is isn't just the number of checks, but
I cannot figure out why my latency values are so terrible.

Overview of my setup:

There are 4 servers.  3 distributed servers (nag1, nag2, nag3) at 3
distinct geological locations send all their check information via
NSCA to a 4th, central server (nag4).  The connections between all of
these servers are very high-bandwidth and are no where near saturated.
 The only unclear spot to me is the effect that our hardware
VPN/tunnels might have, however the worst performing server (nag2) is
on the same LAN as the central server (nag4).

Nagios v2.2, latest plugins and NRPE/NSCA as of today.  I am running
embedded perl with perlcache enabled.


Number of hosts/services:
nag1: 43/130
nag2: 193/1743
nag3: 78 / 780
nag4: (central server - active host checks, passive srvc checks)

Performance Info:

nag1:
Metric                            Min               Max               Average
Check Execution Time:  	0.00 sec        20.04 sec       0.024 sec
Check Latency:	            0.00 sec          1.01 sec       0.011 sec
Percent State Change:	 0.00 %           17.17 %         0.01%

nag2
Check Execution Time:  	0.00 sec	929.13 sec	 1.246 sec
Check Latency:	            0.00 sec	   1180.67 sec	  560.462 sec
Percent State Change:	 0.00%	        55.59%	           0.07%

nag3:
Check Execution Time:  	0.00 sec	101.70 sec	 0.310 sec
Check Latency:	            0.00 sec	    602.57 sec	    46.023 sec
Percent State Change:	 0.00%	         0.00%	            0.00%


Machine load numbers:
nag1: load average: 0.05, 0.08, 0.02  / mem: 470 / 512MB physical ; not swapping
nag2: load average: 0.50, 0.61, 0.59  / mem: 330 / 512MB physical ; not swapping
nag3: load average: 0.25, 0.52, 0.56  / mem: 330 / 512MB physical ; not swapping

Machine hardware:
1Us running Fedora Core 4 / P4 2.4GHz / 512MB RAM / 40GB ATA 8MB cache
7200rpm drives



Ok, that is all I can think of off the top of my head.  I have
reviewed the performance tuning tuning doc (from here:
http://nagios.sourceforge.net/docs/2_0/tuning.html), but I am open to
trying things again / in a different way.  I can list off what I've
done in response to that doc on a point-by-point basis if anyone is
interested.

Thanks for any help -- this latency issue is the last big hurdle
before getting this thing going.

~trask


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list