Patches for improved NEB control

bobi at netshel.net bobi at netshel.net
Tue Oct 24 18:58:39 CEST 2006


Attached is a patch-set I would like some feedback on.

The purpose of this patch is to allow Nagios the ability to delegate the
execution of service checks to a NEB module.

Why would we want to do this?  I'm glad you asked...

The point is to allow Nagios to scale efficiently in large-scale
environments by delegating service checks to multi-node "check" clusters. 
That is, it facilitates the creation of a Nagios Service Check Cluster (or
multiple independent clusters,) that can be deployed in either one
location or multiple locations.

The benefits are:

1. It de-couples Service Check execution from Scheduling on the same box. 
Sure, you can do this by setting up multiple Nagios instances that report
their results passivley back up to the "master" Nagios box, but that
requires manually splitting up you configuration among multiple Nagios
instances, setting up all of the passive result reporting, etc.

In this scenario, you can keep your centrally-located master configuration
file and have the service check distributed to light-weight,
geographically-dispersed service check clusters.

2. Scalability.  You can support more simultaneous service checks by
adding more light-weight service check nodes incrementally.

You can start with zero external nodes (i.e., all checks still executed by
Nagios internally.) Then add one node as you service check count
increases.  Then gradually (or quickly,) increase the node count, locally
or remotely, as your service check count grows, and the system will scale
appropriately.

Anyway, it's not the ultimate, end-all, be-all, but we have found it helps
us scale and manage Nagios efficiently in our large-scale,
multi-datacenter environment.  The hope is that this will be considered as
a potential part of the new Nagios architecture some day.

For those who want to know how Nagios actually delegates service check
execution to an external cluster via a NEB module, here are the high-level
details:

We have written a multi-threaded NEB module that registers a 
NEBCALLBACK_SERVICE_CHECK_DATA callback and watches for the
NEBTYPE_SERVICECHECK_INITIATE event.

It then takes each service check and distributes it across the network to
multiple "worker" nodes in a cluster (via XML-RPC).  It also takes care of
processing the check results, posting them to the internal Nagios result
queue, plugin timeout conditions, etc.

The way this works is that Nagios now checks the return code from NEB
modules who are registered for the NEBCALLBACK_SERVICE_CHECK_DATA event.

If the NEB module returns the "new" NEBERROR_CALLBACKOVERRIDE result code,
Nagios "delegates" execution of the service check to the NEB module. 
Otherwise, Nagios continues to execute the service check itself, as it
normally does.

So, the attached patch files enable this functionality.

Note that this patch set does not include our multi-threaded NEB module
(if you're interested in that, just e-mail me - it's meant to be open
source.)  It just includes the patches to allow a NEB modules to override
service check execution.

This should be a pretty straightforward patch, and doesn't modify any
functionality in the absence of the broker. We just need it to expand the
flexibility of what a NEB module can do.

Thanks,
Bob
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: broker.c.diff
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061024/9c7d9025/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: broker.h.diff
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061024/9c7d9025/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: checks.c.diff
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061024/9c7d9025/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neberror.h.diff
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061024/9c7d9025/attachment-0003.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nebmods.c.diff
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061024/9c7d9025/attachment-0004.ksh>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list