1000+ processes then Nagios fails

Shane_Seidel at gwf.com.au Shane_Seidel at gwf.com.au
Mon Dec 9 23:01:18 CET 2002





Russell, Marc,

Here are my current relevant settings:-
inter_check_delay_method=0.01
max_concurrent_checks=750
service_reaper_frequency=10
aggregate_status_updates=0

Program-Wide Performance Information (disk checks >5mins):-
Time Frame Checks Completed
<= 1 minute: 109 (14.6%)
<= 5 minutes: 481 (64.6%)
<= 15 minutes: 745 (100.0%)
<= 1 hour: 745 (100.0%)
Since program start: 

**********************************************************************************************************************************************
This email and its attachments are confidential subject to copyright and may be legally privileged. If they have come to 
you in error you should take no action based upon the contents nor should you copy or show them to anyone. Please 
delete the email and its attachments and inform administrators at gwf.com.au
Any views or opinions expressed are those of the author and do not necessarily represent those of George Weston Foods 
Ltd.
Security: Internet email is not a completely secure medium, please note this when considering the content of your message.
Viruses: We take precautions to ensure email is free of viruses but cannot guarantee this. Accordingly we advise
scanning all email and attachments
*********************************************************************************************************************************************

-------------- next part --------------

? 745 (100.0%)

Note that Nagios is now easily keeping up:-
Mem: ? 513928K av, ?331684K used, ?182244K free,
Swap: 1052248K av, ? ?7216K used, 1045032K free

I believe the inter_check_delay_method setting had the most impact on system
performance although aggregate_status_updates seems to have contributed.

System is running as good as I'd expect and I am hesitant at fixing that which
is not broke but your suggestions are appreciated and I will test them should I
have further issues.

Many Thanks
Shane




To: ? ? ? ?Marc Powell <mpowell at ena.com>
cc: ? ? ? ?Shane Seidel/GWFIS/GWF at GWF, nagios-users at lists.sourceforge.net

Subject: ? ? ? ?Re: [Nagios-users] 1000+ processes then Nagios fails


[IMAGE]
When I am referring to the pipe, I mean the pipe between the forked-off
plugins and the main Nagios daemon. ?The way that Nagios gets
information back from the plugins is that they all write to one pipe (in
this case, the pipe is an object in the C code, not a pipe file like
nagios.cmd).

You are right that the command_check_interval refers to the command
pipe, nagios.cmd. ?But the service_reaper_frequency affects the reading
of the communication pipe between the plugins and the daemon.

Hope this clears things up.

-Russell

Marc Powell wrote:

> Are you certain that's what the service_reaper_frequency applies to? I
> thought that command_check_interval applied to the external command
> pipe and that service_reaper_frequency only applied to local active
> checks that do not write to the external command pipe.
>
>
>
> --
>
> Marc
>
> ? ? -----Original Message-----
> ? ? From: Russell Scibetti [mailto:russell at quadrix.com]
> ? ? Sent: Mon 12/9/2002 11:11 AM
> ? ? To: Shane_Seidel at gwf.com.au; nagios-users at lists.sourceforge.net
> ? ? Cc:
> ? ? Subject: Re: [Nagios-users] 1000+ processes then Nagios fails
>
> ? ? I've read some of the replies, and I have one more suggestion to try.
> ? ? ?If memory is the problem, and not CPU Load, try lowering the
> ? ? service_reaper_frequency. ?By default it is set to 10, which means
> ? ? every
> ? ? 10 seconds, Nagios will remove the contents of the pipe (what ALL the
> ? ? plugins write to - there is only 1 pipe) and process them.
>
> ? ? If you have that many services, you might be overwriting the pipe. ?I
> ? ? had similar problems on one of my systems where the box kept swapping
> ? ? all the time (about 750 service checks, most every 5 minutes). ?If
> ? ? you
> ? ? try lowering that value (try 5 for starters), Nagios will read the
> ? ? pipe
> ? ? more frequently, so it shouldn't get overwritten as much.
>
> ? ? Just another suggestion.
>
> ? ? -Russell Scibetti
>
> ? ? Shane_Seidel at gwf.com.au wrote:
>
> ? ? >
> ? ? >
> ? ? >
> ? ? >Hi All,
> ? ? >
> ? ? >We have a dual P3-1200mhz 512M RAM server running Nagios 1.0
> ? ? monitoring 180
> ? ? >devices and 800 services.
> ? ? >
> ? ? >I have noticed that the number of nagios processes increase until
> ? ? they reach a
> ? ? >count of approx 1000 at which time the server complains it is
> ? ? "out of memory"
> ? ? >and starts shutting down services.
> ? ? >
> ? ? >I found that executing '/etc/rc.d/init.d/nagios reload' from cron
> ? ? would "solve"
> ? ? >the problem. The number of processes would return to approx 60
> ? ? and then start to
> ? ? >climb again. I have the cron job execute every 30 mins.
> ? ? >
> ? ? >I took the config and put all the hosts, services, etc into
> ? ? Netsaint 0.7 on a
> ? ? >P2-350Mhz 128 mb RAM and processes rarely rise to over 100 and
> ? ? then return to
> ? ? >40-60.
> ? ? >
> ? ? >Note that I use the "default" option while compiling to maintain
> ? ? backward
> ? ? >compatibility for Netsaint.
> ? ? >
> ? ? >Has anyone else experienced this? Is there any way to restrict
> ? ? the number of
> ? ? >processes used by nagios. Note also that the big server also runs
> ? ? MRTG/RRD on
> ? ? >approx 20 devices, although mrtg process complete
> ? ? >
> ? ? >Any help appreciated
> ? ? >Thanks
> ? ? >Shane
> ? ? >
> ? ? >
> ? ? >
> ? ? >
>
>**********************************************************************************************************************************************
>
> ? ? >This email and its attachments are confidential subject to
> ? ? copyright and may be legally privileged. If they have come to
>
> ? ? >you in error you should take no action based upon the contents
> ? ? nor should you copy or show them to anyone. Please
> ? ? >delete the email and its attachments and inform
> ? ? administrators at gwf.com.au
> ? ? >Any views or opinions expressed are those of the author and do
> ? ? not necessarily represent those of George Weston Foods
> ? ? >Ltd.
> ? ? >Security: Internet email is not a completely secure medium,
> ? ? please note this when considering the content of your message.
>
> ? ? >Viruses: We take precautions to ensure email is free of viruses
> ? ? but cannot guarantee this. Accordingly we advise
> ? ? >scanning all email and attachments
>
>*********************************************************************************************************************************************
>
> ? ? >
> ? ? >
> ? ? >
> ? ? >-------------------------------------------------------
> ? ? >This SF.net email is sponsored by: Get the new Palm Tungsten T
> ? ? >handheld. Power & Color in a compact size!
> ? ? > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
> ? ? >_______________________________________________
> ? ? >Nagios-users mailing list
> ? ? >Nagios-users at lists.sourceforge.net
> ? ? > https://lists.sourceforge.net/lists/listinfo/nagios-users
> ? ? >
> ? ? >
>
> ? ? --
> ? ? Russell Scibetti
> ? ? Quadrix Solutions, Inc.
> ? ? http://www.quadrix.com
> ? ? (732) 235-2335, ext. 7038
>
>
>
>
>
> ? ? -------------------------------------------------------
> ? ? This sf.net email is sponsored by:ThinkGeek
> ? ? Welcome to geek heaven.
> ? ? http://thinkgeek.com/sf
> ? ? _______________________________________________
> ? ? Nagios-users mailing list
> ? ? Nagios-users at lists.sourceforge.net
> ? ? https://lists.sourceforge.net/lists/listinfo/nagios-users
>

--
Russell Scibetti
Quadrix Solutions, Inc.
http://www.quadrix.com
(732) 235-2335, ext. 7038



[IMAGE]

(Embedded image moved to file: pic16220.pcx)
(See attached file: C.gif)
(See attached file: att1.htm)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic16220.pcx
Type: application/octet-stream
Size: 1498 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021210/bff6ce80/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: C.gif
Type: image/gif
Size: 65 bytes
Desc: Compuserve GIF
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021210/bff6ce80/attachment.gif>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021210/bff6ce80/attachment.htm>


More information about the Users mailing list