URGENT : Help on Nagios Latency & too many Nagios running processes

Jasmine Chua jasmine.chua at securecirt.com
Thu Jan 16 09:12:03 CET 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

But it still does not solve the problem. I have implemented checking 
freshness.. and my central server is getting a lot of results saying that it 
has not received an update from the distributed monitoring servers for some 
time. And, because host status are not updated they are seen as down when the 
hosts are not down. 

On Tuesday 07 January 2003 11:41, you wrote:
> Hi Jasmine -
>
> Set the service_reaper_frequency value to 3 or 4 and see if that helps (it
> should).
>
> On 4 Jan 2003 at 3:14, Jasmine Chua wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi Ethan,
> >
> > I am facing a big problem with Latency. I have tried the mailing list,
> > and also read up the docs but I still cant seem to figure out how to
> > bring it down. I am using Nagios 1.0 and nagios-plugins 1.3.0-beta2
> > running on Linux Suse8.0. My central monitoring server is currently
> > monitoring 560 services and 156 hosts in total. And,  504 are passive and
> > disabled, 56 are active services. It is doing active checks as well as
> > accepting passive checks from 6 distributed monitoring servers.  NSCA is
> > run using tcpserver. All data are logged to database.
> >
> > According to the docs, " The spacing of service checks (also known as the
> > inter-check delay) is used to minimize/equalize the load on the local
> > host running Nagios and the interleaving is used to minimize/equalize
> > load imposed on remote hosts."
> >
> > I have tried tweaking latency by increasing the max_concurrent_checks as
> > well as reducing service_reaper_frequency. ALso, I have once set
> > average_check_interval to 540secs, so as to increase the inter-check
> > delay but still does not help. Now, I have it set back to 300secs. I am
> > afraid interleaving is not something I can do about as it is calculated
> > by no.of services / no. of hosts. All calculations are set to SMART. I
> > couldnt see what else I can do to bring latency down.
> >
> > On one of the distributed monitoring servers, is reporting a lot of
> > problem hosts and services. Just wondering if there are many alert
> > problems, will it slow Nagios down? My guess is it will slow down..
> > right? Because without those problems, my Nagios work perfectly. That
> > particular distributed monitoring server however, is playing a major role
> > in monitoring 417 services & 137 hosts and reporting results back to the
> > central server.
> >
> > Also, what is the reason for causing many Nagios processes to be running.
> > Because apparently I am having many Nagios running on this particular
> > distributed monitoring server.
> >
> > I will be truly grateful if you can help me because this has been bugging
> > me for past 2 weeks now. Below are further details and if you shld need
> > more details please feel free to contact me via email anytime!
> >
> >
> > Central Monitoring Server:
> >
> > max_concurrent_checks=240
> > service_reaper_frequency=10
> > service_check_timeout=90
> > host_check_timeout=10
> > event_handler_timeout=30
> > notification_timeout=30
> > ocsp_timeout=10
> > interval_length=60
> > use_agressive_host_checking=0
> > command_check_interval=-1
> >
> > Calculations are set to SMART.
> >
> >                                    Min  Max  Average
> > Check Execution Time: <1s  90s   2.87s
> > Check Latency:             <1s  2850s 1014s
> >
> >
> > SERVICE SCHEDULING INFORMATION
> >         -------------------------------
> >         Total services:             560
> >         Total hosts:                156
> >
> >         Command check interval:     -1 sec
> >         Check reaper interval:      10 sec
> >
> >         Inter-check delay method:   SMART
> >         Average check interval:     300.000 sec
> >         Inter-check delay:          0.536 sec
> >
> >         Interleave factor method:   SMART
> >         Average services per host:  3.590
> >         Service interleave factor:  4
> >
> >         Initial service check scheduling info:
> >         --------------------------------------
> >         First scheduled check:      1041619501 -> Sat Jan  4 02:45:01
> > 2003 Last scheduled check:       1041619801 -> Sat Jan  4 02:50:01 2003
> >
> >         Rough guidelines for max_concurrent_checks value:
> >         -------------------------------------------------
> >         Absolute minimum value:     19
> >         Recommend value:            57
> >
> > That particular distributed monitoring server:
> >
> > max_concurrent_checks=168
> > service_reaper_frequency=10
> > service_check_timeout=90
> > host_check_timeout=10
> > interval_length=60
> > use_agressive_host_checking=0
> >
> > SERVICE SCHEDULING INFORMATION
> >         -------------------------------
> >         Total services:             417
> >         Total hosts:                137
> >
> >         Command check interval:     -1 sec
> >         Check reaper interval:      10 sec
> >
> >         Inter-check delay method:   SMART
> >         Average check interval:     300.000 sec
> >         Inter-check delay:          0.719 sec
> >
> >         Interleave factor method:   SMART
> >         Average services per host:  3.044
> >         Service interleave factor:  4
> >
> >         Initial service check scheduling info:
> >         --------------------------------------
> >         First scheduled check:      1041620893 -> Fri Jan  3 19:08:13
> > 2003 Last scheduled check:       1041621194 -> Fri Jan  3 19:13:14 2003
> >
> >         Rough guidelines for max_concurrent_checks value:
> >         -------------------------------------------------
> >         Absolute minimum value:     14
> >         Recommend value:            42
> >
> > - --
> > Jasmine Chua
> > Security Engineer, SecureCiRT (A SBU of Z-Vance Pte Ltd)
> > http://www.securecirt.com
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.0.6 (GNU/Linux)
> > Comment: For info see http://www.gnupg.org
> >
> > iD8DBQE+FeEHNgvTa7Hj2AURAq+RAJ9Ps4Uxf1oraQy7kI908GDHtHoewgCguou1
> > hIzoX3MLLfrMsyMOQ2/UdB8=
> > =uIsV
> > -----END PGP SIGNATURE-----
>
> Ethan Galstad,
> Nagios Developer
> ---
> Email: nagios at nagios.org
> Website: http://www.nagios.org

- -- 
Jasmine Chua
Security Engineer, SecureCiRT (A SBU of Z-Vance Pte Ltd)
http://www.securecirt.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE+JmlXNgvTa7Hj2AURAmGSAJ4+aKxHv++xhkzRYtsfUvMszv0gYwCgpMbB
YWZEFpGUip/uTyFwy7WS8ms=
=UrUZ
-----END PGP SIGNATURE-----



-------------------------------------------------------
This SF.NET email is sponsored by: A Thawte Code Signing Certificate 
is essential in establishing user confidence by providing assurance of 
authenticity and code integrity. Download our Free Code Signing guide:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en




More information about the Users mailing list