[patch] NSCA version 2.9 pre-forked daemon mode

Andreas Ericsson ae at op5.se
Wed Dec 7 10:49:39 CET 2011


On 12/06/2011 11:41 PM, Michel Belleau wrote:
> Hi.
> 
> I recently had the chance to look at version 2.9 of NSCA. We were
> originally running it in "--daemon" mode at our installation, but it
> looks like the daemon mode is not bounded by any means; NSCA can
> fork() as much as it wants (up to the socket listener limit, but that
> is still quite a bit of processes) to process the incoming check
> results and I didn't like that. I had a look at the "--single" mode
> of operation and from our tests results, it doesn't scale as much as
> we need.
> 
> I went in and modified the code a bit to implement a PREFORK mode
> where the NSCA daemon forks a number of processes at startup and
> respawns them if they exit for some errors. In my opinion, this
> should have better scalability than the single-threaded mode and
> better resources usage behavior when handling many messages per
> second. This is imitating the mpm_prefork worker for "httpd" a bit
> (much more simplistic though). This adds a new "--prefork"
> command-line option to NSCA.
> 
> I also think that the new "check_result_path" configuration directive
> is a good performance shortcut, so that is with what I tested it and
> it gave good results for now.
> 
> Any comments are welcome, if you want to include the patch upstream,
> feel free as I would be glad to have contributed to that project. The
> included patch applies clean on nsca-trunk; revision 1846.
> 

Why not use multiplexing? A single process can easily handle 20k
simultaneous connections that way, and it would make it easier to
rewrite nsca to use the up-and-coming unix-socket input method to
Nagios (persistent connection) instead of the current pipe method
(which needs to be set up over and over again).

Or, as Daniel says, use xinetd to limit connections.

I for one am quite curious as to what happens when the connection
queue gets full with this patch and inbound connectors are unable
to connect anymore. The reason you're getting bazillions of procs
is that that many events are coming in, so if you're "fixing" the
problem by speeding up handling a little and limiting the number
a lot you're going about it the wrong way.

In short; Have you tested this with some really serious connection
spamming, like 100 servers trying to connect and submit checkresults
as quickly as they possibly can? That's the sort of load you kinda
have to handle for this to work for users with very large networks.
The small-network users don't have this problem, so unless it works
in the super-large scenario I'm afraid this is just code-churn with
no real benefit.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/




More information about the Developers mailing list