RFC/RFP Nagios command workers

Onotsky, Steve x55328 Steve.Onotsky at broadridge.com
Wed May 18 17:11:40 CEST 2011


> -----Original Message-----
> From: Andreas Ericsson [mailto:ae at op5.se]
> Sent: May-18-11 10:44
> To: Nagios-users at lists.sourceforge.net; nagios-devel
> Subject: [Nagios-users] RFC/RFP Nagios command workers
> 
> Ahoy again.
> 
> Since discussion on the last requests for comments and patches has
> splintered off and gotten somewhere, it's time for the next mail in
> the series of what us awesome gods of the Nagios core decided to
> work on for the next grand version of Nagios.
> 
> This idea comes from Shinken, mod_gearman and DNX which have all
> implemented versions of it, so creds and kudos to the authors of
> those projects.
> 
> Currently, Nagios eats quite a lot of I/O when writing, scanning for
> and reading the check result files. This becomes especially noticeable
> in large installations. There's also the problem of Nagios using a
> lot more copied memory per fork than it's supposed to, and the fact
> that embedding scripting languages inside the Nagios core to speed
> up execution is a potentially disastrous action (as the debacle with
> embedded Perl has proven to be).
> 
> The idea to solve all of that is to fork() off a set of worker
> threads at startup that free()'s all possible memory and re-connects
> to the master process via a unix domain socket (or network socket
> that by default only listens to the localhost address) to receive
> requests to run commands and return the results of those commands.
> 
> This has several benefits, although they're not immediately user
> visible.
> * I/O load will decrease significantly, leaving more disk throughput
>   capacity for performance data graphing or status data database
>   solutions.
> * Scripting languages can be embedded regardless of memory leaks and
>   whatnot, since worker daemons can be killed off and respawned every
>   50000 checks (or something), thus causing the kernel to clean up
>   any and all leaked memory.
> * Nagios core can be single-threaded, which means higher portability,
>   less memory usage and more robust code.
> * Eventbroker modules that use a socket to communicate with an
external
>   daemon can instead register a handler for inbound packets and then
>   simply "own" that connection and get all future packets from it
>   forwarded as eventbroker events. This will ofcourse reduce the
module
>   complexity quite a bit for nearly all much-used modules today
> (Merlin,
>   livestatus, DNX, mod_gearman, NDOUtils, etc...)
> * It becomes possible to receive responses from Nagios when submitting
>   commands (the current FIFO pipe is one-way communication only).
> 
> Drawbacks:
> * It's quite a large and invasive change to the nagios core which
>   will require a lot of testing.
> 
> I know some people I met in Italy have already volunteered to help
> implementing and testing this (Hi Cheik), but it would definitely be
> helpful to get feedback from module authors and users when making this
> change to Nagios.
> 
> Please note that a compatibility daemon which continues to parse the
> simple FIFO will ofcourse have to be implemented so that current
> scripts
> and whatnot keep on working, and the API to scan for and read check
> result files will also remain for the foreseeable future, although
> possibly implemented as an external helper program which can ship
> check results into the Nagios socket instead.
> 
> Comments, patches and (before summer's out) testing is very much
> appreciated.
> 
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
> 

Sounds like a fantastic idea.  I'm all for it; if I had more available
time, I'd gladly volunteer to assist (but as it stands, I'm a man down
on my team and have to pick up slack).

Best of luck, please keep us posted!

Cheers


Steve Onotsky
Team Lead, Server Support
Broadridge
Investor Communication Solutions, Canada
5970 Chedworth Way
Mississauga  ON  L5R 4G5
Tel: (905) 507-5328
Fax: (905) 507-5312
Inet: steve.onotsky at broadridge.com
 
Quando omni flunkus moritati.

This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.


------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list