Bug in Nagios orphan-check?!

Carroll, Jim P [Contractor] jcarro10 at sprintspectrum.com
Thu Feb 6 19:18:32 CET 2003


Try running "sar -r" to see if there's a correlation between the times
you've had problems and any indication of memory shortage.  Or whether the
memory is quite bountiful at all times.

jc

> -----Original Message-----
> From: Matthias Eichler [mailto:me at ame.de]
> Sent: Thursday, February 06, 2003 3:01 AM
> To: Carroll, Jim P [Contractor]
> Cc: nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] Bug in Nagios orphan-check?!
> 
> 
> Hi Jim,
> 
> the box has 256 MB RAM as I posted and 512 MB of swap.
> But I think under normal circumstances the box should not run out
> of memory. 
> 
> Now e.g. Nagios is running for 17 hours and the stats of the box are:
> ---cut---
> 09:57:16 up 63 days, 23:13,  1 user,  load average: 0.01, 0.07, 0.03
> 56 processes: 53 sleeping, 3 running, 0 zombie, 0 stopped
> CPU states:   1.8% user,   4.0% system,   0.0% nice,  94.2% idle
> Mem:    253916K total,   208776K used,    45140K free,   
> 115400K buffers
> Swap:   257032K total,     4248K used,   252784K free,    
> 35288K cached
> ---cut---
> 
> This looks pretty fine for me, especially because 200MB of 
> RAM are not really 
> used, 32MB are cached and 115MB buffered...if Nagios or any 
> other app should 
> need more RAM here it should get it...
> 
> So IMHO there must be some bug if Nagios runs out of 
> (memory-)control in that 
> way after just 5 days running...
> Ok, I have 478 services on 32 hosts but that should not be 
> sooo much for such a 
> box, shouldnt it?!?
> 
> Matthias
> 
> Zitat von "Carroll, Jim P [Contractor]" <jcarro10 at sprintspectrum.com>:
> 
> > It *does* seem that you're running out of memory.  Just a guess.
> > 
> > You haven't mentioned how much RAM or swap you have on this 
> machine.  80
> > nagios processes isn't much, considering I've had quite a 
> bit more than that
> > in the past.  Granted, I'm also running with 1 GB of RAM 
> and 2 GB of swap.
> > 
> > You might want to consider adding more RAM and bump up your 
> swap space.
> > 
> > jc
> > 
> > > -----Original Message-----
> > > From: Matthias Eichler [mailto:me at ame.de]
> > > Sent: Wednesday, February 05, 2003 4:54 AM
> > > To: nagios-users at lists.sourceforge.net
> > > Subject: [Nagios-users] Bug in Nagios orphan-check?!
> > > 
> > > 
> > > Hi List,
> > > 
> > > I have some Nagios 1.0 installed on a Debian 3 Woody. The machine
> > > is some Intel Celeron 700 MHz with 256 MB of RAM.
> > > 
> > > The setup was doing really well for some long time, but now I get
> > > some severe problems more or less every five days.
> > > 
> > > Today we had some connection problems to some remote 
> farm. But Nagios
> > > didnt send out host-down notifications, it said this in 
> its event log:
> > > 
> > > "Warning: The check of service 'blabla' on host 'blabla' 
> looks like
> > > it was orphaned (results never came back). I'm scheduling 
> an immediate
> > > check of the service..."
> > > Since this first entry Nagios reported this warning with 
> EVERY service
> > > check, about 142 times...
> > > At this time I tried to get on the web interface and got 
> no connect,
> > > the SSH login took very long, what I am not wondering about, 
> > > because the
> > > box had a load of 7.83!
> > > I saw that there were about 80 nagios processes in the 
> list. They were
> > > not stopped by some /etc/init.d/nagios stop, I had to 
> kill them all.
> > > 
> > > In dmesg I see entries like:
> > > ---
> > > Feb  5 11:12:14 ozzy kernel: Out of Memory: Killed process 18026
> > > (apache).
> > > Feb  5 11:12:20 ozzy kernel: Out of Memory: Killed process 18022
> > > (apache).
> > > ---
> > > or
> > > ---
> > > Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> > > waitpid(13304,...) failed, errno 512
> > > Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> > > waitpid(13305,...) failed, errno 512
> > > Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> > > waitpid(13306,...) failed, errno 512
> > > ---
> > > 
> > > I think there might be some bug, because also the remote
> > > site is not available Nagios should warn us about it and
> > > not confuse the box like this...?!?
> > > 
> > > Any ideas?!? 
> > > 
> > > Greetings from Munich,
> > > 
> > > Matthias
> > > 
> > > -- 
> > > 
> > > Mit freundlichen Grüßen
> > > AME Aigner Media & Entertainment GmbH
> > > 
> > > 
> > > Matthias Eichler
> > > Leiter Technik | Technical Director
> > > _______________________________________
> > > 
> > > AME® Aigner Media & Entertainment GmbH
> > > Bavariaring 8        D-80336 München
> > > 
> > > Tel [+49] Ø89.427 05 - 330
> > > Fax [+49] Ø89.427 05 - 400
> > > 
> > > http://ame.de        eMail: me at ame.de
> > > _______________________________________
> > > Angaben nach TDG|GmbHG:ame.de/impressum
> > > 
> > 
> 
> 
> -- 
> 
> Matthias Eichler
> Leiter Technik | Technical Director
> _____________________________________ 
> 
> AME Aigner Media & Entertainment GmbH 
> Bavariaring 8       D-80336 Muenchen 
> 
> Tel: [+49] Ø89.427.05-330
> Fax: [+49] Ø89.427.05-400 
> _____________________________________
> 


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com




More information about the Users mailing list