Nagios 'Out Of Memory' Problems

Armistead, Raffy rarmistead at datanamicsinc.com
Thu Mar 23 22:51:17 CET 2006


I am not sure exactly what process is causing it to run out of memory.
Since I have it as a dedicated Nagios system I would imagine it is
Nagios that is causing a problem. This occurred when we had about 4000
devices but very seldom and it wasn't much of an issue then. Now that we
almost have 7000 devices that are being monitored it is happening more
frequently. Since this was the case I had assumed it was Nagios but
didn't know how to go about fixing the problem.

I do not know that much about Linux so I am not sure how to go about
setting that up. How do I setup ulimits for memory utilization? What
steps would I go about to monitor memory utilization for the Nagios
server?

I had checked the nagios.cfg file and I do have that setting at -1:

command_check_interval=-1


I appreciate any help. Thanks.

Raffy 

-----Original Message-----
From: Marc Powell [mailto:marc at ena.com] 
Sent: Thursday, March 23, 2006 11:12 AM
To: nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] Nagios 'Out Of Memory' Problems



> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Armistead, Raffy
> Sent: Thursday, March 23, 2006 12:23 PM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Nagios 'Out Of Memory' Problems
> 
> I have a problem with my Nagios server constantly crashing. It keeps
> outputting on the screen Out of Memory errors which causes loss of
access
> to the server. I can ping the box but I cannot SSH or web into it to
view
> any information. This has been happening increasingly more lately. Now
it
> is about every 2-3 days that this is occurring. We have been adding
more
> and more devices to the servers and this problem has been increasing
as
> this occurs. This is how I have it set up.
> 
> 
> 
> I have a Main Nagios server that is running the latest 2.0 (stable)
Nagios
> release. It is monitoring about 6800 devices but it is not actively
> checking the devices. Its main role is to provide a web interface and
> receive passive polls from three other servers which do the polling.
The
> main server also does email notifications when a device goes down. The
> server sends about 30-40 emails a day. I am using NSCA 2.5 between the
> server and the client Nagios servers. I am only monitoring one service
for
> each device which is either TCP or ping depending on the device.
Mostly
> all devices are monitored with TCP (roughly 6000). The rest are
monitored
> with ping. The individual servers are pretty evenly spread with the
number
> of devices. They are about 2000-2500 each. 
> 
> Can someone please help me in resolving this problem? Thanks

Have you determined what process is using the memory? One of the first
steps you should take is to set appropriate ulimits for memory
utilization for that user so that it doesn't bring down the server. I
would configure nagios to monitor memory on that server then use top or
ps to identify the process(es) using the allocated memory when memory
utilization is high. That will provide better direction for
troubleshooting rather than simply that the machine is crashing due to
memory exhaustion. The nagios deamon itself isn't going to be using a
lot of RAM (10M on my box with 3400 passive services).

My somewhat unfounded guess is that perhaps nagios isn't reaping the
results from NSCA frequently enough so you're having a backlog of ncsa
processes. Each process uses just a little memory but if you have
thousands of them then it adds up. I've personally experienced this on a
machine that was experiencing disk problems. If this is the case, beyond
a hardware problem or capacity issue, I'd verify that your
command_check_interval is set to -1 to make sure that nagios is checking
the external command file as quickly as it can.

--
Marc 


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the live
webcast
and join the prime developer group breaking into this new coding
territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list