1000+ processes then Nagios fails

Russell Scibetti russell at quadrix.com
Mon Dec 9 20:27:27 CET 2002


When I am referring to the pipe, I mean the pipe between the forked-off 
plugins and the main Nagios daemon.  The way that Nagios gets 
information back from the plugins is that they all write to one pipe (in 
this case, the pipe is an object in the C code, not a pipe file like 
nagios.cmd).

You are right that the command_check_interval refers to the command 
pipe, nagios.cmd.  But the service_reaper_frequency affects the reading 
of the communication pipe between the plugins and the daemon.

Hope this clears things up.

-Russell

Marc Powell wrote:

> Are you certain that's what the service_reaper_frequency applies to? I 
> thought that command_check_interval applied to the external command 
> pipe and that service_reaper_frequency only applied to local active 
> checks that do not write to the external command pipe.
>
>  
>
> --
>
> Marc
>
>     -----Original Message-----
>     From: Russell Scibetti [mailto:russell at quadrix.com]
>     Sent: Mon 12/9/2002 11:11 AM
>     To: Shane_Seidel at gwf.com.au; nagios-users at lists.sourceforge.net
>     Cc:
>     Subject: Re: [Nagios-users] 1000+ processes then Nagios fails
>
>     I've read some of the replies, and I have one more suggestion to try.
>      If memory is the problem, and not CPU Load, try lowering the
>     service_reaper_frequency.  By default it is set to 10, which means
>     every
>     10 seconds, Nagios will remove the contents of the pipe (what ALL the
>     plugins write to - there is only 1 pipe) and process them. 
>
>     If you have that many services, you might be overwriting the pipe.  I
>     had similar problems on one of my systems where the box kept swapping
>     all the time (about 750 service checks, most every 5 minutes).  If
>     you
>     try lowering that value (try 5 for starters), Nagios will read the
>     pipe
>     more frequently, so it shouldn't get overwritten as much.
>
>     Just another suggestion.
>
>     -Russell Scibetti
>
>     Shane_Seidel at gwf.com.au wrote:
>
>     >
>     >
>     >
>     >Hi All,
>     >
>     >We have a dual P3-1200mhz 512M RAM server running Nagios 1.0
>     monitoring 180
>     >devices and 800 services.
>     >
>     >I have noticed that the number of nagios processes increase until
>     they reach a
>     >count of approx 1000 at which time the server complains it is
>     "out of memory"
>     >and starts shutting down services.
>     >
>     >I found that executing '/etc/rc.d/init.d/nagios reload' from cron
>     would "solve"
>     >the problem. The number of processes would return to approx 60
>     and then start to
>     >climb again. I have the cron job execute every 30 mins.
>     >
>     >I took the config and put all the hosts, services, etc into
>     Netsaint 0.7 on a
>     >P2-350Mhz 128 mb RAM and processes rarely rise to over 100 and
>     then return to
>     >40-60.
>     >
>     >Note that I use the "default" option while compiling to maintain
>     backward
>     >compatibility for Netsaint.
>     >
>     >Has anyone else experienced this? Is there any way to restrict
>     the number of
>     >processes used by nagios. Note also that the big server also runs
>     MRTG/RRD on
>     >approx 20 devices, although mrtg process complete
>     >
>     >Any help appreciated
>     >Thanks
>     >Shane
>     >
>     >
>     >
>     >
>     >**********************************************************************************************************************************************
>
>     >This email and its attachments are confidential subject to
>     copyright and may be legally privileged. If they have come to
>
>     >you in error you should take no action based upon the contents
>     nor should you copy or show them to anyone. Please
>     >delete the email and its attachments and inform
>     administrators at gwf.com.au
>     >Any views or opinions expressed are those of the author and do
>     not necessarily represent those of George Weston Foods
>     >Ltd.
>     >Security: Internet email is not a completely secure medium,
>     please note this when considering the content of your message.
>
>     >Viruses: We take precautions to ensure email is free of viruses
>     but cannot guarantee this. Accordingly we advise
>     >scanning all email and attachments
>     >*********************************************************************************************************************************************
>
>     >
>     >
>     >
>     >-------------------------------------------------------
>     >This SF.net email is sponsored by: Get the new Palm Tungsten T
>     >handheld. Power & Color in a compact size!
>     > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
>     >_______________________________________________
>     >Nagios-users mailing list
>     >Nagios-users at lists.sourceforge.net
>     > https://lists.sourceforge.net/lists/listinfo/nagios-users
>     >
>     >
>
>     -- 
>     Russell Scibetti
>     Quadrix Solutions, Inc.
>     http://www.quadrix.com
>     (732) 235-2335, ext. 7038
>
>
>
>
>
>     -------------------------------------------------------
>     This sf.net email is sponsored by:ThinkGeek
>     Welcome to geek heaven.
>     http://thinkgeek.com/sf
>     _______________________________________________
>     Nagios-users mailing list
>     Nagios-users at lists.sourceforge.net
>     https://lists.sourceforge.net/lists/listinfo/nagios-users
>

-- 
Russell Scibetti
Quadrix Solutions, Inc.
http://www.quadrix.com
(732) 235-2335, ext. 7038


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021209/575217c8/attachment.html>


More information about the Users mailing list