[Apan-users] Re: nagios and apan cause server to crash...

jeff vier jeff.vier at tradingtechnologies.com
Tue Oct 14 17:46:11 CEST 2003


On Tue, 2003-10-14 at 01:21, Fredrik Wänglund wrote:
> What platform/version are you running on?

RH9, dual 1.4GHz, 1G RAM
139 Hosts, 838 services, about 300 of which are apan-based

> I'm running without any problem under RedHat 8.0 on a PIII 1400MHz with 
> 170 hosts, 200 apan-services and 300 'normal' services.
> My system-load stays between 1 and 2, CPU is mainly >80% idle

until the "apan problem", load hangs out at around .3 to .9 (depending
on what it's doing - it's only blipped over 1.0 twice in 24 hours) with
an idle solidly at 86% (sar shows a min of 85.23% idle and a max of
86.96% in the last 24 hours).  So the box is quite clean and happy.

Like I said before, when the apan freak-out comes around, though, it
shoots WAY up.

Notably, I wrote a little watcher daemon to check for rogue apan
processes.  If anyone wants it, email me.

> jeff vier wrote:
> 
> >I'm having the same problem here.
> >
> >I have been capturing dumps of the top command, pulling only active
> >processes.  It looks like something causes an instance of apan.sh to
> >hang, and then they just start piling up (fast).
> >
> >The load is usually under 1.0 (sometimes jumping up to 1.xx - no big
> >deal).  When it died, my load was over 80 (yes eighty) with 46 (maybe
> >more) *active* apan processes (not sure of the actual count, top dump
> >only shows 62 lines of processes.  It said 73 running, though, so likely
> >more were apan.sh - also, unknown count of inactive apan.sh process
> >sitting and waiting), 17 zombies (unknown parent, alas). 99% CPU usage
> >on CPU0, 100% on CPU1.  Yikes.  This jump happened over 16 minutes, at
> >which point my crons no longer ran, so who knows how badly it kept
> >piling up.
> >
> >apan.debug log file doesn't show anything abnormal (whee.)
> >
> >I'm going to have to write a watcher to manually kill the hanging
> >apan.sh procs, which I don't want to do for fear of inadvertently
> >killing valid processes, but I am quite sick of having to go over to the
> >colo to poke the power button once a week (only been in production 3
> >weeks - 4 crashes so far).
> >
> >I'm going to increase my level of manual debugging, too, of processes,
> >etc.  I'll post any new insight.
> >
> >--jeff
> >
> >On Wed, 2003-10-08 at 10:31, Matthew Wilson wrote:
> >  
> >
> >>UPDATE: I have checked and my nagios installation does not have ePN compiled
> >>in.  So this is not the cause.  I would greatly appreciate any suggestions
> >> on how to prevent/cure this problem.
> >>
> >>    
> >>
> >>>Thanks
> >>>Matthew Wilson.
> >>>      
> >>>
> >>>>Matthew Wilson wrote:
> >>>>
> >>>>        
> >>>>
> >>>>>Hi guys,
> >>>>>I have read in the list archives in the last couple of months a few
> >>>>>threads about nagios and apan chewing up memory.  I have tried a few
> >>>>>of the solutions posted but still have no joy.
> >>>>>          
> >>>>>
> >>
> >>-------------------------------------------------------
> >>This SF.net email is sponsored by: SF.net Giveback Program.
> >>SourceForge.net hosts over 70,000 Open Source Projects.
> >>See the people who have HELPED US provide better services:
> >>Click here: http://sourceforge.net/supporters.php
> >>_______________________________________________
> >>Nagios-users mailing list
> >>Nagios-users at lists.sourceforge.net
> >>https://lists.sourceforge.net/lists/listinfo/nagios-users
> >>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> >>::: Messages without supporting info will risk being sent to /dev/null
> >>    
> >>
> >
> >
> >
> >-------------------------------------------------------
> >This SF.net email is sponsored by: SF.net Giveback Program.
> >SourceForge.net hosts over 70,000 Open Source Projects.
> >See the people who have HELPED US provide better services:
> >Click here: http://sourceforge.net/supporters.php
> >_______________________________________________
> >Apan-users mailing list
> >Apan-users at lists.sourceforge.net
> >https://lists.sourceforge.net/lists/listinfo/apan-users
> >  
> >
> 



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list