Patches for improved NEB control

Andreas Ericsson ae at op5.se
Wed Oct 25 10:27:31 CEST 2006


This is a very nice patch indeed. It doesn't break anything that's 
working now, but lets module-authors get more power over how nagios 
executes checks. It's also relatively small and non-intrusive and, as a 
side-effect, it makes it possible to write plugins as modules. Overall, 
I like it.

Some questions though, inlined below. Oh, and I would very much like to 
see the module. :)


bobi at netshel.net wrote:
> Attached is a patch-set I would like some feedback on.
> 
> The purpose of this patch is to allow Nagios the ability to delegate the
> execution of service checks to a NEB module.
> 
> Why would we want to do this?  I'm glad you asked...
> 
> The point is to allow Nagios to scale efficiently in large-scale
> environments by delegating service checks to multi-node "check" clusters. 
> That is, it facilitates the creation of a Nagios Service Check Cluster (or
> multiple independent clusters,) that can be deployed in either one
> location or multiple locations.
> 
> The benefits are:
> 
> 1. It de-couples Service Check execution from Scheduling on the same box. 
> Sure, you can do this by setting up multiple Nagios instances that report
> their results passivley back up to the "master" Nagios box, but that
> requires manually splitting up you configuration among multiple Nagios
> instances, setting up all of the passive result reporting, etc.
> 
> In this scenario, you can keep your centrally-located master configuration
> file and have the service check distributed to light-weight,
> geographically-dispersed service check clusters.
> 

How does the module determine which node checks what?
How is configuration distributed?

> 2. Scalability.  You can support more simultaneous service checks by
> adding more light-weight service check nodes incrementally.
> 

Do you have to restart the "master" nagios in order for this to work, or 
will they be picked up as one goes along?
If "picked up as one goes along", how does handshake and authentication 
work?

> You can start with zero external nodes (i.e., all checks still executed by
> Nagios internally.) Then add one node as you service check count
> increases.  Then gradually (or quickly,) increase the node count, locally
> or remotely, as your service check count grows, and the system will scale
> appropriately.
> 
> Anyway, it's not the ultimate, end-all, be-all, but we have found it helps
> us scale and manage Nagios efficiently in our large-scale,
> multi-datacenter environment.  The hope is that this will be considered as
> a potential part of the new Nagios architecture some day.
> 
> For those who want to know how Nagios actually delegates service check
> execution to an external cluster via a NEB module, here are the high-level
> details:
> 
> We have written a multi-threaded NEB module that registers a 
> NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
> NEBTYPE_SERVICECHECK_INITIATE event.
> 
> It then takes each service check and distributes it across the network to
> multiple "worker" nodes in a cluster (via XML-RPC).  It also takes care of
> processing the check results, posting them to the internal Nagios result
> queue, plugin timeout conditions, etc.
> 

Does this go through the FIFO pipe? If so, I'm afraid it doesn't solve 
the biggest issue in scaling Nagios to large networks.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Developers mailing list