DNX Version 0.13 Released!

Adam Augustine augustineas at gmail.com
Thu Oct 11 00:40:57 CEST 2007


</me slaps forehead>
Good catch. I got busy and sloppy. Sorry to the list, and thanks to
Andreas for the reminder.

Distributed Nagios eXecutor (DNX) consists of a NEB module, server,
and client daemons which allow the check plug-ins to execute across
multiple "worker nodes" in a load distribution cluster. It allows you
to scale up the number of checks you are doing without having to
manage full Nagios installs on multiple boxes.

The project is hosted at http://dnx.sourceforge.net.

It is designed to be minimally invasive to the existing Nagios
configuration (only a single line to load the NEB module). The
workload of actually executing the plugins is then distributed across
all the participating nodes.

On 10/4/07, Andreas Ericsson <ae at op5.se> wrote:
> I'm not sure how widely used DNX is, but it might be appropriate to include
> a small note on what it is and where to get it with each announcement such
> as this, that reaches other lists than dnx-devel@, where everyone presumably
> knows the what, the why and the where.
>
> Adam Augustine wrote:
> > Sorry for the long time between releases. Bob has has been working on
> > it quite a bit (I'm the one who's been slacking), so this should catch
> > us up a bit. This release should resolve all the outstanding bugs and
> > addresses a few of the features.
> >
> > Once we found the short term stability bug, worker nodes were able to
> > run for a couple of weeks under our normal load before the
> > communications channel bug got us. But Bob worked his magic and found
> > the problem in record time.
> >
> > If you can, please test and let us know how it works for you.
> >
> > Version 0.13:
> > =============
> > - Added out-of-memory condition checking for all strdup(3) calls.
> > - Fixed DNX communications channel exhaustion bug. This bug occurred when
> > a Client worker thread exited: Although the dispatch and collector channels
> > were properly closed, they weren't released from the DNX Channel Map pool.
> > Since this pool has a finite number of slots, we ran out of slots eventually.
> > Running out of slots then prevented the creation of any additional worker
> > threads.
> > - Fixed memory leak in the Client, related to the above problem.
> > - Fixed the same DNX communications channel exhaustion bug in the NEB Server
> > module as well. Although, this was not likely to occur very often.
> > - Added some additional error and debug logging.
> > - Added some graceful handling of NULL strings in the XML protocol messaging.
> >
> >
> > Version 0.12:
> > =============
> > - Implemented the auditWorkerJobs directive in the server's configuration
> > file. This feature allows you to track which worker nodes are executing
> > which service checks.
> > - Fixed negative job counter issue in client.
> > - Added debugging level support for the server. Setting the debug flag in
> > the server config file to any positive integer enables debugging. The
> > higher the integer, the more verbose the debugging output.
> > - The server module no longer writes messages to nagios.log. All server
> > modules messages are now written to the syslog.
> > - Cleaned-up memory leaks in both server and client.
> > - Fixed nasty corner-case where a job might be expired and collected at
> > the same time, causing a heap corruption due to the job structure memory
> > being freed twice. Even though this race-condition was properly semaphore-
> > protected, the expiration thread didn't properly mark the expired job
> > as removed from the global job queue. Hence, the collection thread might
> > acquire the semaphore right after the expiration thread released it, and
> > therefore still see the expired job as active. The job would then be
> > "collected", even though it was already "expired".
> >
> >
> > Version 0.11:
> > =============
> > - Implemented the localCheckPattern directive in the server's configuration
> > file. This permits you to specify an extended regular expression string which
> > will be used to see if a check command job should execute locally (instead
> > of being sent to a DNX client.)
> >
> >
> > Version 0.10:
> > =============
> > - Fixed improper XML parsing of command or response values, where the
> > command/response contains embedded XML tags (or even just embedded
> > angle-brackets). This fix affects both the server and the client,
> > since they share the common XML parsing routines contained in dnxXml.c
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc.
> > Still grepping through log files to find problems?  Stop.
> > Now Search log events and configuration files using AJAX and a browser.
> > Download your FREE copy of Splunk now >> http://get.splunk.com/
> > _______________________________________________
> > Nagios-devel mailing list
> > Nagios-devel at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list