[Apan-users] Re: nagios and apan cause server to crash...

Fredrik Wänglund fredrik.wanglund at datavis.se
Tue Oct 14 16:03:12 CEST 2003


Is this a RH9 problem? Or is it related to a secific version of the 
kernel, bash, ... ?
What other OS-versions have this problem?

/FredrikW



Igor Kurtovic wrote:

> step back to RH 8.0 ..
>
> i had similar probs, the only difference was a daily crash :P
>
> even with changed reaper-frequency there was no improvement to see.
> after getting it back on RH 8.0 all is fine again.
>
> 300 hosts
> 1500 services
> 400 apan's
> 150 mrtg-hosts
>
> all on this box:
>
> Dual Xeon III 1 Ghz
> 2 GB RAM
>
> never had any perfomrance issues or stability probs b4 going onto RH 9.0
>
> Regards, Igor
>
>
>
> On Tue, 2003-10-14 at 09:25, Fredrik Wänglund wrote:
>
>>I have service_reaper_frequency=3, and I remember that before I changed 
>>it from the default, my load used to be 8-10.
>>
>>/FredrikW
>>
>>Evan Weston wrote:
>>
>>>I was having a simmilar problem under Redhat 9 on a pIII 900 512 meg ram.
>>>
>>>I set 'service_reaper_frequency=4' instead of the default 'service_reaper_frequency=10' in the 'nagios.cfg' file and its completely stable now.
>>>
>>>Evan Weston
>>>
>>>
>>>-----Original Message-----
>>>From: Fredrik Wänglund [mailto:fredrik.wanglund at datavis.se]
>>>Sent: Tuesday, 14 October 2003 4:21 PM
>>>To: jeff vier
>>>Cc: Matthew Wilson; nagios-users; Apan-users List
>>>Subject: Re: [Apan-users] Re: [Nagios-users] nagios and apan cause server to crash...
>>>
>>>What platform/version are you running on?
>>>
>>>I'm running without any problem under RedHat 8.0 on a PIII 1400MHz with
>>>170 hosts, 200 apan-services and 300 'normal' services.
>>>My system-load stays between 1 and 2, CPU is mainly >80% idle
>>>
>>>jeff vier wrote:
>>>
>>>  
>>>
>>>>I'm having the same problem here.
>>>>
>>>>I have been capturing dumps of the top command, pulling only active
>>>>processes.  It looks like something causes an instance of apan.sh to
>>>>hang, and then they just start piling up (fast).
>>>>
>>>>The load is usually under 1.0 (sometimes jumping up to 1.xx - no big
>>>>deal).  When it died, my load was over 80 (yes eighty) with 46 (maybe
>>>>more) *active* apan processes (not sure of the actual count, top dump
>>>>only shows 62 lines of processes.  It said 73 running, though, so likely
>>>>more were apan.sh - also, unknown count of inactive apan.sh process
>>>>sitting and waiting), 17 zombies (unknown parent, alas). 99% CPU usage
>>>>on CPU0, 100% on CPU1.  Yikes.  This jump happened over 16 minutes, at
>>>>which point my crons no longer ran, so who knows how badly it kept
>>>>piling up.
>>>>
>>>>apan.debug log file doesn't show anything abnormal (whee.)
>>>>
>>>>I'm going to have to write a watcher to manually kill the hanging
>>>>apan.sh procs, which I don't want to do for fear of inadvertently
>>>>killing valid processes, but I am quite sick of having to go over to the
>>>>colo to poke the power button once a week (only been in production 3
>>>>weeks - 4 crashes so far).
>>>>
>>>>I'm going to increase my level of manual debugging, too, of processes,
>>>>etc.  I'll post any new insight.
>>>>
>>>>--jeff
>>>>
>>>>On Wed, 2003-10-08 at 10:31, Matthew Wilson wrote:
>>>>
>>>>
>>>>    
>>>>
>>>>>UPDATE: I have checked and my nagios installation does not have ePN compiled
>>>>>in.  So this is not the cause.  I would greatly appreciate any suggestions
>>>>>on how to prevent/cure this problem.
>>>>>
>>>>>  
>>>>>
>>>>>      
>>>>>
>>>>>>Thanks
>>>>>>Matthew Wilson.
>>>>>>    
>>>>>>
>>>>>>        
>>>>>>
>>>>>>>Matthew Wilson wrote:
>>>>>>>
>>>>>>>      
>>>>>>>
>>>>>>>          
>>>>>>>
>>>>>>>>Hi guys,
>>>>>>>>I have read in the list archives in the last couple of months a few
>>>>>>>>threads about nagios and apan chewing up memory.  I have tried a few
>>>>>>>>of the solutions posted but still have no joy.
>>>>>>>>        
>>>>>>>>
>>>>>>>>            
>>>>>>>>
>>>>>-------------------------------------------------------
>>>>>This SF.net email is sponsored by: SF.net Giveback Program.
>>>>>SourceForge.net hosts over 70,000 Open Source Projects.
>>>>>See the people who have HELPED US provide better services:
>>>>>Click here: http://sourceforge.net/supporters.php
>>>>>_______________________________________________
>>>>>Nagios-users mailing list
>>>>>Nagios-users at lists.sourceforge.net
>>>>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>>>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>>>>>::: Messages without supporting info will risk being sent to /dev/null
>>>>>  
>>>>>
>>>>>      
>>>>>
>>>>
>>>>-------------------------------------------------------
>>>>This SF.net email is sponsored by: SF.net Giveback Program.
>>>>SourceForge.net hosts over 70,000 Open Source Projects.
>>>>See the people who have HELPED US provide better services:
>>>>Click here: http://sourceforge.net/supporters.php
>>>>_______________________________________________
>>>>Apan-users mailing list
>>>>Apan-users at lists.sourceforge.net
>>>>https://lists.sourceforge.net/lists/listinfo/apan-users
>>>>
>>>>
>>>>    
>>>>
>>>
>>>
>>>
>>>
>>>-------------------------------------------------------
>>>This SF.net email is sponsored by: SF.net Giveback Program.
>>>SourceForge.net hosts over 70,000 Open Source Projects.
>>>See the people who have HELPED US provide better services:
>>>Click here: http://sourceforge.net/supporters.php
>>>_______________________________________________
>>>Apan-users mailing list
>>>Apan-users at lists.sourceforge.net
>>>https://lists.sourceforge.net/lists/listinfo/apan-users
>>>  
>>>
>>
>>
>>
>>
>>-------------------------------------------------------
>>This SF.net email is sponsored by: SF.net Giveback Program.
>>SourceForge.net hosts over 70,000 Open Source Projects.
>>See the people who have HELPED US provide better services:
>>Click here: http://sourceforge.net/supporters.php
>>_______________________________________________
>>Nagios-users mailing list
>>Nagios-users at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>>::: Messages without supporting info will risk being sent to /dev/null
>>
>-- 
>********************************
>
>Igor Kurtovic
>Technische Systemlösungen
>QSC AG
>
>Phone:   +49 221 6698 404
>Mobile:  +49 163 6698 075
>Fax:     +49 221 6698 469
>WWW:     www.q-dsl.de
>Email:   igor.kurtovic at qsc.de
>
>********************************
>
>
>  
>




-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list