fork issues and latency

Thomas Guyot-Sionnest dermoth at aei.ca
Sat Feb 14 17:08:51 CET 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/02/09 12:27 PM, Jeff Frost wrote:
> I've got a Nagios-3.0.4 server monitoring 3,290 services on 387
> hosts.    When the nagios service is initially started, service and host
> latency is great.  This usually continues for about 2-3 hours and then
> we start seeing fork errors in the log like so:
> 
> [1234425582] Warning: The check of service 'ssh' on host 'mail02' could
> not be performed due to a fork() error: 'Cannot allocate memory'.  The
> check will be rescheduled.
> 
> At about the same time, we start seeing lots of orphaned
> /tmp/checkXXXXXX files and indications that the max concurrent checks
> value has been reached:
> 
> [1234458853] Max concurrent service checks (500) has been reached. 
> Delaying further checks until previous checks are complete...
> 
> It should be noted that during this time period, there is 2GB of free
> memory and 1.2GB of cache available out of the 4GB on the nagios server,
> so I'm thinking it has to be something besides system RAM that's exhausted.
> 
> Naturally, when this starts happening, the latencies begin to increase
> and seem to settle somewhere around 98seconds and interestingly enough,
> this causes the load to drop to nearly nothing.
> 
> We have already set the following in nagios.cfg:
> 
> service_reaper_frequency=2
> use_large_installation_tweaks=1
> enable_environment_macros=0
> 
> If we enable the embedded perl interpreter, the forking issues happen
> much more quickly after restart (minutes instead of hours).

Which OS/distribution are you running? How much RAM do you have? Free
RAM? SWAP?

Please send results of "free -m" with and without Nagios running.

Also send the RSS size of the Nagios process after start, and once you
get the fork errors.

Nagios 3 is leaking some memory, especially when using the ePN. However
unless your server is really short on RAM it shouldn't be a huge problem.

If you're stuck with low-end hardware make sure to run the server
without its graphical interface and disable as many daemons as possible.
A slim Linux distribution like Slackware (if you use Linux) could also
help. Another setting that could help is limiting check parallelization,
though it was reported that there may be a problem with it on Nagios3
(it hasn't been confirmed AFAIK).


- --
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJluyT6dZ+Kt5BchYRAo67AKCGGhi+EzKbxNvkMuzOkYOqsQDG3ACgqIG9
9jlBUwg6O2pM6vWA7qQdNTs=
=l5Hz
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list