DNX Alpha 0.07 Released

Adam Augustine augustineas at gmail.com
Thu Mar 1 01:56:28 CET 2007


So after jumping through a lot of hoops and getting bitten by bug
after bug (always just when we thought we were ready for a release),
we finally have something we aren't too ashamed to have people take a
poke at.

We are publicly announcing that the Distributed Nagios eXecutor is
available. Bob (bobi at netshel.net) mentioned it several months ago in
conjunction with some patches he submitted (NEB callback result codes,
etc) and now it is finally ready along with a SourceForge page and
supporting lists. Have a look at http://dnx.sourceforge.net and let us
know what you think.

We do expect bugs and any help in fixing them is welcome.

The illustrations on the screenshots page should help to explain what
is actually going on under the hood and the code if fairly well
documented. Here is the blurbage from one of our docs:

The Problem:
The current suggested method of scaling Nagios (the popular open
source monitoring system) to include multiple servers has a few small
practical disadvantages.

Each check is configured to execute on a particular distributed server
which then passively sends its results back up to the central box
which must have a matching passive check configured. This means that
an administrator must install Nagios on each box and maintain the
configuration of each check in two places (on the central server and
on one of the distributed servers) and keep track of which check is
executing on which box, which can be pretty tedious for larger
installations with multiple boxes. More critically though, if a
particular distributed server fails, all checks configured on that
server will not get executed (and all will alarm on the central
server, if freshness checking is configured).

The Approach:
What would be ideal from an administration standpoint would be to have
the Nagios host itself distribute its checks automatically and
dynamically to a group of "worker nodes" in a cluster. This ideal
would also include:

1) Minimal configuration changes to the central Nagios node (one or
two new lines, no changes to the 	checks themselves, no wrapper
scripts, etc)

2) The solution must avoid using the FIFO pipe, due to its scalability issues 	

3) Worker nodes should be able to be added and removed without
configuration changes (with the possible exception of security entries
to prevent rogue nodes stealing checks or inserting bogus results)

4) If a worker node fails in some way, only the then-in-flight checks
would be lost resulting in a "(Service Check Timeout)", and when
Nagios re-tried the check, it would then be executed on any one of the
remaining cluster nodes.

5) Checks should not have affinity for any particular node (for the
reason stated in #4). If local resources require that a particular
node execute a particular check, this should be accomplished via NRPE.

Our Response:
Distributed Nagios eXecutor (DNX) is a Nagios Event Broker (NEB)
module (the concept of a NEB is somewhat like Linux kernel module)
that intercepts the check commands just before the fork-fork-exec
stage. Worker nodes request jobs and the NEB module matches the check
command with a job request and sends it to the appropriate node to
execute. The worker node executes the check command and passes the
results back to the NEB module, which inserts it directly into the
results queue data structure (bypassing the FIFO pipe).

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list