Bug in Nagios orphan-check?!

Matthias Eichler me at ame.de
Thu Feb 6 10:01:29 CET 2003


Hi Jim,

the box has 256 MB RAM as I posted and 512 MB of swap.
But I think under normal circumstances the box should not run out
of memory. 

Now e.g. Nagios is running for 17 hours and the stats of the box are:
---cut---
09:57:16 up 63 days, 23:13,  1 user,  load average: 0.01, 0.07, 0.03
56 processes: 53 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:   1.8% user,   4.0% system,   0.0% nice,  94.2% idle
Mem:    253916K total,   208776K used,    45140K free,   115400K buffers
Swap:   257032K total,     4248K used,   252784K free,    35288K cached
---cut---

This looks pretty fine for me, especially because 200MB of RAM are not really 
used, 32MB are cached and 115MB buffered...if Nagios or any other app should 
need more RAM here it should get it...

So IMHO there must be some bug if Nagios runs out of (memory-)control in that 
way after just 5 days running...
Ok, I have 478 services on 32 hosts but that should not be sooo much for such a 
box, shouldnt it?!?

Matthias

Zitat von "Carroll, Jim P [Contractor]" <jcarro10 at sprintspectrum.com>:

> It *does* seem that you're running out of memory.  Just a guess.
> 
> You haven't mentioned how much RAM or swap you have on this machine.  80
> nagios processes isn't much, considering I've had quite a bit more than that
> in the past.  Granted, I'm also running with 1 GB of RAM and 2 GB of swap.
> 
> You might want to consider adding more RAM and bump up your swap space.
> 
> jc
> 
> > -----Original Message-----
> > From: Matthias Eichler [mailto:me at ame.de]
> > Sent: Wednesday, February 05, 2003 4:54 AM
> > To: nagios-users at lists.sourceforge.net
> > Subject: [Nagios-users] Bug in Nagios orphan-check?!
> > 
> > 
> > Hi List,
> > 
> > I have some Nagios 1.0 installed on a Debian 3 Woody. The machine
> > is some Intel Celeron 700 MHz with 256 MB of RAM.
> > 
> > The setup was doing really well for some long time, but now I get
> > some severe problems more or less every five days.
> > 
> > Today we had some connection problems to some remote farm. But Nagios
> > didnt send out host-down notifications, it said this in its event log:
> > 
> > "Warning: The check of service 'blabla' on host 'blabla' looks like
> > it was orphaned (results never came back). I'm scheduling an immediate
> > check of the service..."
> > Since this first entry Nagios reported this warning with EVERY service
> > check, about 142 times...
> > At this time I tried to get on the web interface and got no connect,
> > the SSH login took very long, what I am not wondering about, 
> > because the
> > box had a load of 7.83!
> > I saw that there were about 80 nagios processes in the list. They were
> > not stopped by some /etc/init.d/nagios stop, I had to kill them all.
> > 
> > In dmesg I see entries like:
> > ---
> > Feb  5 11:12:14 ozzy kernel: Out of Memory: Killed process 18026
> > (apache).
> > Feb  5 11:12:20 ozzy kernel: Out of Memory: Killed process 18022
> > (apache).
> > ---
> > or
> > ---
> > Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> > waitpid(13304,...) failed, errno 512
> > Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> > waitpid(13305,...) failed, errno 512
> > Feb  5 11:21:48 ozzy kernel: request_module[net-pf-10]:
> > waitpid(13306,...) failed, errno 512
> > ---
> > 
> > I think there might be some bug, because also the remote
> > site is not available Nagios should warn us about it and
> > not confuse the box like this...?!?
> > 
> > Any ideas?!? 
> > 
> > Greetings from Munich,
> > 
> > Matthias
> > 
> > -- 
> > 
> > Mit freundlichen Grüßen
> > AME Aigner Media & Entertainment GmbH
> > 
> > 
> > Matthias Eichler
> > Leiter Technik | Technical Director
> > _______________________________________
> > 
> > AME® Aigner Media & Entertainment GmbH
> > Bavariaring 8        D-80336 München
> > 
> > Tel [+49] Ø89.427 05 - 330
> > Fax [+49] Ø89.427 05 - 400
> > 
> > http://ame.de        eMail: me at ame.de
> > _______________________________________
> > Angaben nach TDG|GmbHG:ame.de/impressum
> > 
> 


-- 

Matthias Eichler
Leiter Technik | Technical Director
_____________________________________ 

AME Aigner Media & Entertainment GmbH 
Bavariaring 8       D-80336 Muenchen 

Tel: [+49] Ø89.427.05-330
Fax: [+49] Ø89.427.05-400 
_____________________________________


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com




More information about the Users mailing list