Antwort: How to reduce a very high latency number

srunschke at abit.de srunschke at abit.de
Mon May 22 12:06:45 CEST 2006


nagios-users-admin at lists.sourceforge.net schrieb am 17.05.2006 20:09:16:

> I am still butting up against very high latency issues with my Nagios
> setup.  I feel like I must be missing something obvious because it
> doesn't seem like I have so many services that the servers cannot keep
> up.
>
> nag2: 193/1743
>
> Machine hardware:
> 1Us running Fedora Core 4 / P4 2.4GHz / 512MB RAM / 40GB ATA 8MB cache
> 7200rpm drives

To me this is obviously a performance issue related to hardware.
Your machines have way too few RAM. It is totally not possible to
run 1800 checks on a 512MB machine in a timely manner.

Think about this:
Everytime Nagios starts a check, it forks a child, which forks the
check. Nagios usually uses up 26MB total memory per process, the check
another 5MB maybe. When running 1800 checks, we are speaking of spreading
out 55 GIGAbytes of needed Ram on 512 MB real Ram. Imagine how often
that works without having the machines doing a shitload of swapping and
io-wait. I really cannot imagine how such a machine can NOT swap when
running Nagios. Are you totally sure that you did not make a mistake
when checking the machine?

Here's a lineup of our dedicated Nagios server, which is a minimal install
of RHES4 with only Nagios/Apache running on it (and the HP Insightmanager
tools and TSM backup client, but that should not reall matter that much 
;)) :

top - 11:48:52 up 69 days, 19:10,  1 user,  load average: 0.75, 0.70, 0.67
Tasks:  53 total,   2 running,  51 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.3% us,  4.3% sy,  0.0% ni, 62.5% id, 23.9% wa,  0.0% hi,  0.0% 
si
Mem:   3116384k total,  2341696k used,   774688k free,    55188k buffers
Swap:  6291448k total,      144k used,  6291304k free,  2148772k cached

This is a HP DL380, 3,6Ghz Xeon with 3GB of Ram and a Raid5. It is 
currently
running "only" 120 hosts with around 500 checks, but those are in a high
frequency schedule - ~400 checks per minute - as those are the 
company-critical
services. Therefor it is under real pressure as you can see from the 2.3GB
Mem usage and the 0.75 load with only 500 checks. But I think it is kinda
comparable to your triple amount of checks.

You should really, really upgrade the ram in the machines. In my opinion 
that
would solve most of your problems, as I imagine you have a lot of io-wait 
on
this machine (which you can check with an uptodate top by the way ;))

regards
        Sascha

--
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:SRunschke at abit.de

http://www.abit.net
http://www.abit-epos.net
---------------------------------
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list