Nagios locks up

Andreas Ericsson ae at op5.se
Wed Jun 15 16:18:05 CEST 2005


Platt, Nicholas wrote:
> I've been running Nagios for about a 2 years and found 1.1 to be the most
> stable version.   We had two servers setup and one of the servers seemed to
> always lock up.   Lock up is defined as all services including the Linux OS
> itself not accessible.

This isn't nagios' fault. Update the kernel and then check for bios 
bugs. If nothing else, keep a tab on the system load at all times. If it 
starts spiking, nagios (or a plugin) might have entered an infinite 
loop. That might make it *seem* the system is frozen, but it's not. It 
just takes a week or so to respond.

>  The server is pingable, but the screen is completely
> black and typing on the keyboard made no difference.  I always thought it
> was hardware related; therefore, I transferred it to a new box and it did
> the same.    I also ran a test and determined that when Nagios was off, the
> lock up would never occur.
> 

Funny then that you think Nagios 1.1 is still the most stable. ;)

>  
> 
> I just recently installed a development box and been testing Nagios 2.03b.
> Every once in a while, the same lock up happens and the only thing can do is
> a POWER RESET.  Has anyone experienced similar occurrence? 

First time I've ever heard of it. Are you using any custom plugins that 
read the status of some local hardware? Misbehaving drivers can leave 
the kernel in uninterruptable IO for ever, but that should go away with 
a kernel update.

>    The only
> thing I can think of is that all the boxes I had this happen to SENDMAIL was
> turned off.  I run an hourly routine to clear the ClientMailQueue.  SENMAIL
> is turned off so I don't get two pages for every device.    Any advice would
> be appreciated.  Thank you.
> 

Turn it on? If the queue holds more than 32768 files (on ext2 
filesystem), old kernels (linux <= 2.3.2, or something) can freeze when 
trying to add another inode. Accessing already existing inodes is 
impossible at that point, because the kernel is already stuck and can't 
deliver it.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list