Nagios 3.0 hanging (10/23 CVS)

Andreas Ericsson ae at op5.se
Wed Oct 24 17:56:27 CEST 2007


Shad L. Lords wrote:
>>> In total, Nagios leaks about 0.8MiB / second on your system when valgrind 
>>> is
>>> running. Quite astonishing, really. Try without embedded perl. It should 
>>> plug
>>> the worst leak, and the rest shouldn't be enough for you to notice 
>>> between
>>> restarts.
>> One other thought as well - If you set "use_large_installation_tweaks"
>> to 1, set it back to zero.  When enabled, this option will make child
>> processes (i.e. host/service checks) not free memory before they exit.
>> The Linux kernel frees this memory if not done explicity/properly by the
>> code, and its more efficient at doing it.  However, the BSD kernel may
>> act a bit differently and not free memory, which could cause a memory 
>> leak.
> 
> I do have use_large_installation_tweaks set to 1.  We have a very large 
> installation and will need this eventually.


How large is very large? Perhaps you'd be better off using a distributed
model.


> I've been running valgrind 
> overnight with embedded perl and have had no issues.  The only thing it 
> reported were the few startup leaks.  Looks like the embedded perl is at 
> fault here.  Thanks for the help.
> 

I take it that's "...overnight *without* embedded perl" ?

> I'm not sure what embedded perl will gain us but most of our checks are home 
> grown perl checks.  We were getting latencies in excess of 300 seconds on 
> nagios 2.x so we were looking at anything we could do to speed up the 
> process.  Looks like embedded perl is out for now.
> 

Embedded perl primarily save you the loading time of the perl interpreter for
each check and some memory per executed plugin. If most of the checks are
indeed in perl, it'll most likely stay hot in the cache more or less for ever,
so the actual loss will probably not accumulate to more than roughly 0.2
seconds / perl-check, and an additional 1-3MiB of RAM / simultaneously running
check. Those 0.2 seconds are pure CPU-time though, so depending on your system,
it might hurt or it may not. If it turns out to be too heavy-going for you, I'd
suggest rewriting your most frequently run perl-programs in C.

Also, you should probably look into setting the normal_check_interval values as
high as you dare for some checks. Disk-checks can usually be set to 15-20 minutes
of normal_check_interval without causing any real-world trouble, but there's
usually plenty of them, so effectively cutting the load impact they have by 75%
is a fairly drastic load alleviation for your Nagios server.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list