Performance issues, too

Andreas Ericsson ae at op5.se
Wed Jan 3 11:53:58 CET 2007


Robert Hajime Lanning wrote:
> I have also been having performance issues with Nagios 2.5 on
> a Sun E220R with two 400MHz procs and 1GB ram.
> 
> Sys stats are at http://lanning.cc/kipper.html
> 
> The large dips in load and system CPU time are when I restart
> Nagios.  (cron'd twice a week, but I have also been making
> a lot of service updates lately, hence the almost once a day
> restarts.)  For the restarts to fix the latency, I have
> "use_retained_scheduling_info=0".
> 
> After about three days the Service Check latency will grow
> to over 300 seconds.  It is usually steady at around 0-5
> seconds, for a couple of days, then it will rise over the
> course of a few hours to over the 300 second mark.
> 

This is a bit bizarre and simply must be related to something else. Does 
Nagios run out of commandbuffer slots? Aren't they freed properly?

> 
> I have noticed the Nagios seems to have a memory leak.  As,
> I have watched over the last hour the process grow from 124M
> to 126M.
> 

This can probably be attributed to the fact that Nagios fork()'s, then 
frees and allocates memory before running execve() in a thread. This 
isn't per se prohibited, but strongly discouraged. I wouldn't be 
surprised to find that other applications that do the same thing will 
leak memory on Sun. On Linux, threads are created in a 1-1 fashion 
(meaning each thread is actually its own process). This holds true for 
some other systems as well, and afaik there are 1-1 thread 
implementations for Sun as well. In any case, the 1-1 thing means that 
the kernel cleans up any left-over memory for the processes when they 
exit, which isn't necessarily the case in a 1-many relationship thread 
implementation. Possibly worth investigating.

> I use ePN with caching.  Most of my checks are SNMP requests
> via ePN scripts (http://lanning.cc/custom_plugins/), with
> p1.pl modified with:
> 
>   use SNMP 5.0;
>   SNMP::loadModules("ALL");
> 

Forgive a novice, but doesn't this make it load all SNMP submodules each 
time it runs a perl-module? That would certainly be a major impact on 
load and could well lead to memory leaks (assuming the submodules aren't 
always freed after having been loaded).

> We have put into our budget to move Nagios to a Linux/Intel
> server.  But, what bugs me is the high CPU time in kernel
> space, because of Nagios.
> 

Again, this is a behaviour not regularly experienced on Linux (which is 
the base for most Nagios installations). Linux is simply very, very good 
at fork(). It doesn't do bother even trying to do other things properly 
(like 1-many threading), simply because it's so damn good at forking. It 
would be interesting to see if your problems go away when you move to 
Linux. I'm not saying it's superior to Solaris, but afaiu, Ethan runs 
all his tests on Linux and would certainly have found bugs of this kind 
if they had bitten him.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list