NSCA in standalone single-process daemon mode

Thomas Guyot-Sionnest Thomas at zango.com
Wed May 3 23:35:19 CEST 2006


> -----Original Message-----
> From: Andreas Ericsson [mailto:ae at op5.se] 
> Sent: May 3, 2006 4:19
> To: Thomas Guyot-Sionnest
> Cc: nagios-devel at lists.sourceforge.net
> Subject: Re: [Nagios-devel] NSCA in standalone single-process 
> daemon mode
> 
> Thomas Guyot-Sionnest wrote:
> > Hi list,
> > 
> > I'm running a big Nagios monitoring system which has about 
> a hundred of
> > remote passive checks reporting trough NSCA. Lately when I 
> added more
> > passive checks I noticed that the number of "Failed" checks 
> (No results
> > received) increased (For most of the checks it's impossible 
> to say if it did
> > run or not).
> > 
> > I'm currently running NSCA in inetd mode using D. J. 
> Bernstein's tcpserver
> > program. Since most checks are run by Vixie Cron, and 
> therefore will run at
> > the exact same time, my two guess were that either:
> > 
> > 1. I'm jamming up the monitoring server for more that 10 
> seconds will all
> > the checks.
> > 
> > Or 
> > 
> > 2. All NSCA processes writing at the same command file 
> trigger some obscure
> > OS or Nagios bug.
> > 
> > I have reasons to think it's not #1, so to test #2 I wanted 
> to run NSCA in
> > single-process daemon mode. When I do this it get the first 
> passive check
> > correctly and send_nsca fail on all other checks. Running 
> strace I see that
> > it block on the poll syscall after processing the first 
> check, and send_nsca
> > timeouts after 10 seconds.
> > 
> > I'm running Nagios 2.0b3 on Slackware 10.1.0, Dual Athlon 
> MP with 4G of ram,
> > NSCA Version 2.6, Official & unpatched.
> > 
> > Compiled with Gcc:
> > Configured with: ../gcc-3.3.4/configure --prefix=/usr 
> --enable-shared
> > --enable-threads=posix --enable-__cxa_atexit --disable-checking
> > --with-gnu-ld --verbose --target=i486-slackware-linux
> > --host=i486-slackware-linux
> > Thread model: posix
> > gcc version 3.3.4
> > 
> > Any thoutht on what's going wrong here?
> > 
> 
> Nagios' command-file is being filled up. It can only hold 4096 bytes 
> (hard OS limit on most unix-like systems) so with 100+ checks 
> going off 
> at the same time you're lucky to get half of them written to the pipe 
> before it times out.
> 

I doubt it's the case since I have "command_check_interval=-1" and nsca
should just block when the pipe is full.

I noticed in the code that nagios offload the pipe in a circular buffer, and
from what I tested it seems that if this buffer fill up nagios start
dropping commands. However this only occurred when I was sending about 5
times the equivalent of what we currently send to nagios. The way I was
testing is running commands similar to this:

`(for ((i=0; i<500; i++)); do echo '[1146690904]
PROCESS_SERVICE_CHECK_RESULT;hostname;servicename;0;OK: everything is fine';
done) >> /path/to/nagios.cmd`

Running that with i<500 just before passive check results comes affect a few
checks. With i<1000 I get almost no checks in.

With 3000 it blocks on the pipe and takes significantly more time to run
(0m0.053s for 1000, 0m0.405s for the next 1000 and 0m33.856s for the third
1000).



I'd really like to try NSCA in standalone mode, any idea why it stop working
after the first check kicks in?

Thanks,

Thomas Guyot
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3022 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060503/9062a824/attachment.bin>


More information about the Developers mailing list