Patches for improved NEB control

bobi at netshel.net bobi at netshel.net
Thu Oct 26 18:43:26 CEST 2006


Wow.  We got a lot of responses.

We are going to take a few days to write up some documentation and we will
post that to the list.

Just one thing we noticed in the e-mails is that there is some confusion
between this NDO.  DNX does not replace NDO.  NDO is still required for
multiple data centers.

Bob


> This is a very nice patch indeed. It doesn't break anything that's
> working now, but lets module-authors get more power over how nagios
> executes checks. It's also relatively small and non-intrusive and, as a
> side-effect, it makes it possible to write plugins as modules. Overall,
> I like it.
>
> Some questions though, inlined below. Oh, and I would very much like to
> see the module. :)
>
>
> bobi at netshel.net wrote:
>> Attached is a patch-set I would like some feedback on.
>>
>> The purpose of this patch is to allow Nagios the ability to delegate the
>> execution of service checks to a NEB module.
>>
>> Why would we want to do this?  I'm glad you asked...
>>
>> The point is to allow Nagios to scale efficiently in large-scale
>> environments by delegating service checks to multi-node "check"
>> clusters.
>> That is, it facilitates the creation of a Nagios Service Check Cluster
>> (or
>> multiple independent clusters,) that can be deployed in either one
>> location or multiple locations.
>>
>> The benefits are:
>>
>> 1. It de-couples Service Check execution from Scheduling on the same
>> box.
>> Sure, you can do this by setting up multiple Nagios instances that
>> report
>> their results passivley back up to the "master" Nagios box, but that
>> requires manually splitting up you configuration among multiple Nagios
>> instances, setting up all of the passive result reporting, etc.
>>
>> In this scenario, you can keep your centrally-located master
>> configuration
>> file and have the service check distributed to light-weight,
>> geographically-dispersed service check clusters.
>>
>
> How does the module determine which node checks what?
> How is configuration distributed?
>
>> 2. Scalability.  You can support more simultaneous service checks by
>> adding more light-weight service check nodes incrementally.
>>
>
> Do you have to restart the "master" nagios in order for this to work, or
> will they be picked up as one goes along?
> If "picked up as one goes along", how does handshake and authentication
> work?
>
>> You can start with zero external nodes (i.e., all checks still executed
>> by
>> Nagios internally.) Then add one node as you service check count
>> increases.  Then gradually (or quickly,) increase the node count,
>> locally
>> or remotely, as your service check count grows, and the system will
>> scale
>> appropriately.
>>
>> Anyway, it's not the ultimate, end-all, be-all, but we have found it
>> helps
>> us scale and manage Nagios efficiently in our large-scale,
>> multi-datacenter environment.  The hope is that this will be considered
>> as
>> a potential part of the new Nagios architecture some day.
>>
>> For those who want to know how Nagios actually delegates service check
>> execution to an external cluster via a NEB module, here are the
>> high-level
>> details:
>>
>> We have written a multi-threaded NEB module that registers a
>> NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
>> NEBTYPE_SERVICECHECK_INITIATE event.
>>
>> It then takes each service check and distributes it across the network
>> to
>> multiple "worker" nodes in a cluster (via XML-RPC).  It also takes care
>> of
>> processing the check results, posting them to the internal Nagios result
>> queue, plugin timeout conditions, etc.
>>
>
> Does this go through the FIFO pipe? If so, I'm afraid it doesn't solve
> the biggest issue in scaling Nagios to large networks.
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Developers mailing list