nagios and apan cause server to crash...

jeff vier jeff.vier at tradingtechnologies.com
Mon Oct 13 17:32:13 CEST 2003


I'm having the same problem here.

I have been capturing dumps of the top command, pulling only active
processes.  It looks like something causes an instance of apan.sh to
hang, and then they just start piling up (fast).

The load is usually under 1.0 (sometimes jumping up to 1.xx - no big
deal).  When it died, my load was over 80 (yes eighty) with 46 (maybe
more) *active* apan processes (not sure of the actual count, top dump
only shows 62 lines of processes.  It said 73 running, though, so likely
more were apan.sh - also, unknown count of inactive apan.sh process
sitting and waiting), 17 zombies (unknown parent, alas). 99% CPU usage
on CPU0, 100% on CPU1.  Yikes.  This jump happened over 16 minutes, at
which point my crons no longer ran, so who knows how badly it kept
piling up.

apan.debug log file doesn't show anything abnormal (whee.)

I'm going to have to write a watcher to manually kill the hanging
apan.sh procs, which I don't want to do for fear of inadvertently
killing valid processes, but I am quite sick of having to go over to the
colo to poke the power button once a week (only been in production 3
weeks - 4 crashes so far).

I'm going to increase my level of manual debugging, too, of processes,
etc.  I'll post any new insight.

--jeff

On Wed, 2003-10-08 at 10:31, Matthew Wilson wrote:
> 
> UPDATE: I have checked and my nagios installation does not have ePN compiled
> in.  So this is not the cause.  I would greatly appreciate any suggestions
>  on how to prevent/cure this problem.
> 
> > Thanks
> > Matthew Wilson.
> > >
> > > Matthew Wilson wrote:
> > >
> > > > Hi guys,
> > > > I have read in the list archives in the last couple of months a few
> > > > threads about nagios and apan chewing up memory.  I have tried a few
> > > > of the solutions posted but still have no joy.
> >
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> SourceForge.net hosts over 70,000 Open Source Projects.
> See the people who have HELPED US provide better services:
> Click here: http://sourceforge.net/supporters.php
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list