Nagios 'Out Of Memory' Problems

Marco Ramos mramos at co.sapo.pt
Fri Mar 24 10:57:53 CET 2006


Hi,

I had some out of memory and forking problems a while ago. After some
debugging I've tunned some parameters, namely service_reaper_frequency
and max_concurrent_checks.

Maybe this URL will help you: http://www.nagios.org/faqs/viewfaq.php?
faq_id=115

HTH,
Marco Ramos

On Thu, 2006-03-23 at 13:51 -0800, Armistead, Raffy wrote:
> I am not sure exactly what process is causing it to run out of memory.
> Since I have it as a dedicated Nagios system I would imagine it is
> Nagios that is causing a problem. This occurred when we had about 4000
> devices but very seldom and it wasn't much of an issue then. Now that we
> almost have 7000 devices that are being monitored it is happening more
> frequently. Since this was the case I had assumed it was Nagios but
> didn't know how to go about fixing the problem.
> 
> I do not know that much about Linux so I am not sure how to go about
> setting that up. How do I setup ulimits for memory utilization? What
> steps would I go about to monitor memory utilization for the Nagios
> server?
> 
> I had checked the nagios.cfg file and I do have that setting at -1:
> 
> command_check_interval=-1
> 
> 
> I appreciate any help. Thanks.
> 
> Raffy 
> 
> -----Original Message-----
> From: Marc Powell [mailto:marc at ena.com] 
> Sent: Thursday, March 23, 2006 11:12 AM
> To: nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] Nagios 'Out Of Memory' Problems
> 
> 
> 
> > -----Original Message-----
> > From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> > admin at lists.sourceforge.net] On Behalf Of Armistead, Raffy
> > Sent: Thursday, March 23, 2006 12:23 PM
> > To: nagios-users at lists.sourceforge.net
> > Subject: [Nagios-users] Nagios 'Out Of Memory' Problems
> > 
> > I have a problem with my Nagios server constantly crashing. It keeps
> > outputting on the screen Out of Memory errors which causes loss of
> access
> > to the server. I can ping the box but I cannot SSH or web into it to
> view
> > any information. This has been happening increasingly more lately. Now
> it
> > is about every 2-3 days that this is occurring. We have been adding
> more
> > and more devices to the servers and this problem has been increasing
> as
> > this occurs. This is how I have it set up.
> > 
> > 
> > 
> > I have a Main Nagios server that is running the latest 2.0 (stable)
> Nagios
> > release. It is monitoring about 6800 devices but it is not actively
> > checking the devices. Its main role is to provide a web interface and
> > receive passive polls from three other servers which do the polling.
> The
> > main server also does email notifications when a device goes down. The
> > server sends about 30-40 emails a day. I am using NSCA 2.5 between the
> > server and the client Nagios servers. I am only monitoring one service
> for
> > each device which is either TCP or ping depending on the device.
> Mostly
> > all devices are monitored with TCP (roughly 6000). The rest are
> monitored
> > with ping. The individual servers are pretty evenly spread with the
> number
> > of devices. They are about 2000-2500 each. 
> > 
> > Can someone please help me in resolving this problem? Thanks
> 
> Have you determined what process is using the memory? One of the first
> steps you should take is to set appropriate ulimits for memory
> utilization for that user so that it doesn't bring down the server. I
> would configure nagios to monitor memory on that server then use top or
> ps to identify the process(es) using the allocated memory when memory
> utilization is high. That will provide better direction for
> troubleshooting rather than simply that the machine is crashing due to
> memory exhaustion. The nagios deamon itself isn't going to be using a
> lot of RAM (10M on my box with 3400 passive services).
> 
> My somewhat unfounded guess is that perhaps nagios isn't reaping the
> results from NSCA frequently enough so you're having a backlog of ncsa
> processes. Each process uses just a little memory but if you have
> thousands of them then it adds up. I've personally experienced this on a
> machine that was experiencing disk problems. If this is the case, beyond
> a hardware problem or capacity issue, I'd verify that your
> command_check_interval is set to -1 to make sure that nagios is checking
> the external command file as quickly as it can.
> 
> --
> Marc 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting
> language
> that extends applications into web and mobile media. Attend the live
> webcast
> and join the prime developer group breaking into this new coding
> territory!
> http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list