Nagios processes hang

Marantz, Roy Roy.Marantz at deshaw.com
Mon Sep 17 01:34:14 CEST 2007


I was afraid it was the FIFO.  I've already tried renicing the nagios
process, but that didn't help enough.  I'm doing this on Solaris 10 so
recompiling the kernel isn't an option, but I'll check if there is a way
to increase the FIFO size.  Otherwise I'll have to do the nsca & nagios
patch changing the FIFO to a Unix Socket.  I don't suppose anyone has
that patch already?  If so I'd appreciate getting a pointer to it.  I'll
also see if using nscafe makes sense to.  BTW, I think writing a neb
replacement for nsca will take more time than I've got. Thanks.
Roy 

-----Original Message-----
From: Andreas Ericsson [mailto:ae at op5.se] 
Sent: Sunday, September 16, 2007 6:46 PM
To: Marantz, Roy
Cc: 'nagios-users at lists.sourceforge.net'
Subject: Re: [Nagios-users] Nagios processes hang

Marantz, Roy wrote:
> I'm running Nagios 2.8 with around 1400 hosts and around 14000
services
> defined.  I have about 700 active service and the rest come in via
nsca.
> 
> My problem has a few symptoms:
> 1) I collect defunct Nagios processes, around 300 per day
> 2) the command pipe stops getting read so nsca is dumping data to its
> dump file
> 3) active service checks have very long (hours) latency
> 
> These all sound like the same problem to me, but I don't know how to
> diagnose it.  Any help would be appreciated.  I have run nagios -s and
> it doesn't suggest anything.  I'm using check_fping for host checks
and
> my remaining active service checks.  Attached is the output from
nagios
> -v and my nagios.cfg.  Thanks in advance for any help.


The trouble is the FIFO, which holds a maximum of 4096 bytes by default,
meaning it quickly becomes a bottleneck. Nagios tries to empty it as
soon
as there's data available on it, but fails to keep up with the data-spam
from nsca.

You could try re-nicing the nagios process, which might make it capable
of staying ahead of nsca.

Otherwise you could try modifying the FIFO size and recompile the
kernel.

Alternatively, patch nagios and nsca to use a unix socket and use
setsockopt() to up the read/write buffer on that socket to 256 KiB.

The fourth, and possibly tricksiest alternative, is to rewrite nsca as a
neb-module, have it run in a separate thread and update nagios' status
data directly. This last method will scale best but is by far the most
difficult.

Good luck

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list