Nagios 'Out Of Memory' Problems

Armistead, Raffy rarmistead at datanamicsinc.com
Fri Mar 31 18:00:29 CEST 2006


It appears that the problem is still happening. With all the changes I
have made it had lasted a while longer going from 1-2 days up to 4 days
this time. I had perfparse enabled previously but I decided to disable
that since I didn't need it. Does anyone else have any sugguestions on
what else I could try to fix the problem? Thanks. 


Raffy Armistead, CCNP, CCSP
Datanamics, Inc
702-697-2289

-----Original Message-----
From: Armistead, Raffy 
Sent: Saturday, March 25, 2006 11:54 AM
To: nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] Nagios 'Out Of Memory' Problems

Thanks for the information. I have added the lines to the init script
and I will monitor the server to see if it still occurs. I had also
changed the max_concurrent_checks to 50 from 100 and changed
service_reaper_frequency from 2 to 1. This was done on all servers. I
also set process_performance_data from 1 to 0 to disable it from running
in hopes to eliminate more processes. 

I was wondering will this process get executed when running the external
command to reboot the server from the webpage. If it isn't is there
anything I can change for it to do so?

Also what type of computer should the main central server be running for
7000 devices (1 service each). We haven't added more devices to the
Nagios server since the problem has been happening but we expect to add
another 1000 once the problem is corrected and another 1000 or so
throughout the rest of the year. I doubt the computer I had running now
is adequate for the job so I would like to move it to a more powerful
system. Currently I have a 2.4 Celeron PC with 2 GB of RAM running on
the main server. What system should I get to properly handle this many
devices? The individual servers are running the same thing and don't
seem to have problems but should these be upgraded as well?

I am planning on installing Fedora for the OS and had installed
everything as a package with the current setup. To eliminate more
processes can someone tell me what packages should I have installed to
get Nagios, NSCA and Apache working properly without anything else. I am
sure there might be a better flavor of Linux to run Nagios but we have
already standarized using RedHat and Fedora for the OS. 

Thanks again for the information and hopefully this will resolve my
problem.

Raffy

-----Original Message-----
From:	Stephen Barron [mailto:thurgoodj187 at gmail.com]
Sent:	Fri 3/24/2006 9:56 PM
To:	nagios-users at lists.sourceforge.net
Cc:	
Subject:	Re: [Nagios-users] Nagios 'Out Of Memory' Problems

Hi

I had this problem also, and in our case it was NCSA that was filling up
the memory on the Nagios Central Server.  We noticed the problem
increasingly after we installed perfparse, which attempts (poorly) to
stop and restart the nagios process.  The nagios process has to  be
running or the nsca instances will never get processed, and therefore
stay in memory.  I setup a simple process check for nsca and we shoed at
some points 11000+ instances of NCSA, right before having to go to our
data center and give it the old hard reboot.  We use authorized keys and
check_by_ssh to run local checks on the nagios Central Server, athough a
snmp script would also work well.

I would suggest adding a line to your init script for nagios to stop and
start ncsa along with nagios.



ex.
start)
                echo "Starting network monitor: nagios"
                $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
                if [ $? -eq 0 ]; then
                        su - $NagiosUser -c "touch
$NagiosVarDir/nagios.log $NagiosRetentionFile"
                        rm -f $NagiosCommandFile
                        touch $NagiosRunFile
                        chown $NagiosUser:$NagiosGroup $NagiosRunFile
                        $NagiosBin -d $NagiosCfgFile
                        if [ -d $NagiosLockDir ]; then touch
$NagiosLockDir/$NagiosLockFile; fi
                        #sleep 1
                        #status_nagios nagios
                        /usr/local/nagios/bin/nsca -c
/usr/local/nagios/etc/nsca.cfg -d
                        exit 0
                else
                        echo "CONFIG ERROR!  Start aborted.  Check your
Nagios configuration."
                        exit 1
                fi
                ;;


killproc_nagios ()
{

        if test ! -f $NagiosRunFile; then
                echo "No lock file found in $NagiosRunFile"
                return 1
        fi

        NagiosPID=`head -n 1 $NagiosRunFile`
        kill $2 $NagiosPID
        killall -9 nsca
}

Good Luck

Steve


On 3/24/06, Marco Ramos <mramos at co.sapo.pt> wrote:
>
> Hi,
>
> I had some out of memory and forking problems a while ago. After some 
> debugging I've tunned some parameters, namely service_reaper_frequency

> and max_concurrent_checks.
>
> Maybe this URL will help you: http://www.nagios.org/faqs/viewfaq.php?
> faq_id=115
>
> HTH,
> Marco Ramos
>
> On Thu, 2006-03-23 at 13:51 -0800, Armistead, Raffy wrote:
> > I am not sure exactly what process is causing it to run out of
memory.
> > Since I have it as a dedicated Nagios system I would imagine it is 
> > Nagios that is causing a problem. This occurred when we had about 
> > 4000 devices but very seldom and it wasn't much of an issue then. 
> > Now that we almost have 7000 devices that are being monitored it is 
> > happening more frequently. Since this was the case I had assumed it 
> > was Nagios but didn't know how to go about fixing the problem.
> >
> > I do not know that much about Linux so I am not sure how to go about

> > setting that up. How do I setup ulimits for memory utilization? What

> > steps would I go about to monitor memory utilization for the Nagios 
> > server?
> >
> > I had checked the nagios.cfg file and I do have that setting at -1:
> >
> > command_check_interval=-1
> >
> >
> > I appreciate any help. Thanks.
> >
> > Raffy
> >
> > -----Original Message-----
> > From: Marc Powell [mailto:marc at ena.com]
> > Sent: Thursday, March 23, 2006 11:12 AM
> > To: nagios-users at lists.sourceforge.net
> > Subject: RE: [Nagios-users] Nagios 'Out Of Memory' Problems
> >
> >
> >
> > > -----Original Message-----
> > > From: nagios-users-admin at lists.sourceforge.net 
> > > [mailto:nagios-users- admin at lists.sourceforge.net] On Behalf Of 
> > > Armistead, Raffy
> > > Sent: Thursday, March 23, 2006 12:23 PM
> > > To: nagios-users at lists.sourceforge.net
> > > Subject: [Nagios-users] Nagios 'Out Of Memory' Problems
> > >
> > > I have a problem with my Nagios server constantly crashing. It 
> > > keeps outputting on the screen Out of Memory errors which causes 
> > > loss of
> > access
> > > to the server. I can ping the box but I cannot SSH or web into it 
> > > to
> > view
> > > any information. This has been happening increasingly more lately.

> > > Now
> > it
> > > is about every 2-3 days that this is occurring. We have been 
> > > adding
> > more
> > > and more devices to the servers and this problem has been 
> > > increasing
> > as
> > > this occurs. This is how I have it set up.
> > >
> > >
> > >
> > > I have a Main Nagios server that is running the latest 2.0 
> > > (stable)
> > Nagios
> > > release. It is monitoring about 6800 devices but it is not 
> > > actively checking the devices. Its main role is to provide a web 
> > > interface and receive passive polls from three other servers which
do the polling.
> > The
> > > main server also does email notifications when a device goes down.

> > > The server sends about 30-40 emails a day. I am using NSCA 2.5 
> > > between the server and the client Nagios servers. I am only 
> > > monitoring one service
> > for
> > > each device which is either TCP or ping depending on the device.
> > Mostly
> > > all devices are monitored with TCP (roughly 6000). The rest are
> > monitored
> > > with ping. The individual servers are pretty evenly spread with 
> > > the
> > number
> > > of devices. They are about 2000-2500 each.
> > >
> > > Can someone please help me in resolving this problem? Thanks
> >
> > Have you determined what process is using the memory? One of the 
> > first steps you should take is to set appropriate ulimits for memory

> > utilization for that user so that it doesn't bring down the server. 
> > I would configure nagios to monitor memory on that server then use 
> > top or ps to identify the process(es) using the allocated memory 
> > when memory utilization is high. That will provide better direction 
> > for troubleshooting rather than simply that the machine is crashing 
> > due to memory exhaustion. The nagios deamon itself isn't going to be

> > using a lot of RAM (10M on my box with 3400 passive services).
> >
> > My somewhat unfounded guess is that perhaps nagios isn't reaping the

> > results from NSCA frequently enough so you're having a backlog of 
> > ncsa processes. Each process uses just a little memory but if you 
> > have thousands of them then it adds up. I've personally experienced 
> > this on a machine that was experiencing disk problems. If this is 
> > the case, beyond a hardware problem or capacity issue, I'd verify 
> > that your command_check_interval is set to -1 to make sure that 
> > nagios is checking the external command file as quickly as it can.
> >
> > --
> > Marc
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by xPML, a groundbreaking scripting 
> > language that extends applications into web and mobile media. Attend

> > the live webcast and join the prime developer group breaking into 
> > this new coding territory!
> > http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when 
> > reporting any issue.
> > ::: Messages without supporting info will risk being sent to 
> > /dev/null
> >
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by xPML, a groundbreaking scripting 
> > language that extends applications into web and mobile media. Attend

> > the live webcast and join the prime developer group breaking into
this new coding territory!
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> > ::: Messages without supporting info will risk being sent to 
> > /dev/null
> >
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting 
> language that extends applications into web and mobile media. Attend 
> the live webcast and join the prime developer group breaking into this
new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=1216
> 42 _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>


--
Steve


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language that extends applications into web and mobile media. Attend the
live webcast and join the prime developer group breaking into this new
coding territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null






-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language that extends applications into web and mobile media. Attend the
live webcast and join the prime developer group breaking into this new
coding territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list