Memory leaks

Tobias Klausmann klausman at schwarzvogel.de
Tue Jan 23 16:32:28 CET 2007


Hi! 

(First off: if this should also go to nagios-devel, just yell at
 me.)

Nagios 2.6 and 2.5 have memory leaks. They are not that big that
within hours your machine will be swapping, but they degrade
performance in other ways.

First off, their approximate extent.

2.5 and 2.6 without perl cache have the smallest memory leaks. A
fairly busy Nagios server (hardware quoted below) with about 3000
services on about 330 hosts will degrade from 330M used (that's
*not* Nagios alone) to 368M used in about 16 hours. Or about 2.4
MB per hour. The very same machine behaves neutral if Nagios is
not running, so it's definitely Nagios itself.

Activating the embedded Perl interpreter and -cache will increase
the amount of lost memory to about 5-6M per hour. In this case,
however, sometimes the memory usage snaps back, i.e. some of the
lost memory is collected. I've not yet found out what triggers
the reclaim. Still, over the course of hours, more and more
memory is lost. Still, it's roughly linear memory loss.

And finally, there's the advanced permission patch. With that
patch, memory leaking skyrockets to about 15M/hour.

Now all of this could be alleviated by simply restarting Nagios
every night. It's not actually a bugfix but merely doctoring on
the symptoms, but still, it's pragmatic.

Unfortunately, performance degradation is not just on the memory
used front. With increased memory usage, check latency increases.
In the worst case, this can mean that latency increases by 120s in
about six hours. This has the net effect that for our case, we
have to restart Nagios every two hours. 

For the case of 2.5 and 2.6 without the permissions patch, it's
a lot less bad, but still bad enough to require restarting Nagios
at least every eight hours. 

Without all the fancy stuff, we get to restarting Nagios every 24
hours, as described above.

Further observations: the permission patch causes latency
degradation to be directly correlated to amount of notifications,
The more notifications, the quicker things get nasty.

For vanilla Nagios, at least it's clear that in whatever way
memory is wasted, it also slows Nagios down - a possibility would
be a linked list that is walked and gets appended over and over.
But I guess those with knowledge of the inner workings of Nagios
have more clue about this than I do.

The question that remains is, if this can (and will) be tackled
before 3.0 is released. A related question is if Nagios 3 will be
prone to the same problem.

Any thoughts, ideas etc. are appreciated.

Regards,
Tobias

PS: On a whim, I tried running Nagios through/in Valgrind but
honestly got knocked over by the amount of info Valgrind spewed
at me.

PPS: Our setup uses only active service checks, notifications by
mail (some of it to SMS gateways etc). All host checks are active
yet only are executed if needed (the usual way Nagios works). All
host checks are using ping.  All plugins have a hard timeout of
10s.

PPPS: Hardware specs of the machine I tested with:
Dual dualcore Opteron 2.2GHz (Model 2214)
2GBytes of RAM
(if there's anything else relevant, drop me a line)

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list