RFC: New IPC Method for Check Results

william(at)elan.net william at elan.net
Wed Apr 11 19:27:35 CEST 2007


I think its a good idea except case when plugin is an embedded code
within nagios server or code loaded there as module. In those cases
nagios should process data internally through memory operations and
not involve filesystem.

On Wed, 11 Apr 2007, Ethan Galstad wrote:

> Based on issues that have come up in the past regarding the IPC method
> used by Nagios for handling host/service check results, I am proposing a
>  major change to how things are done with Nagios 3.
>
> The current IPC method:
>
> Active host/service check results are passed from child processes to the
> main Nagios process in two pieces: check information through a pipe, and
> plugin output through a temp file.
>
> Passive check results must be fed through the external command file.
> Nagios then forks a child process and passes the check results to the
> main Nagios process in a similar fashion as with active checks.
>
> Problems with the current method:
>
> 1. When the Nagios daemon stops, child processes may still be performing
> host/service checks.  The results of those checks are lost, which is not
> ideal.
>
> 2. Large numbers of passive checks (from distributed/redundant setups)
> can cause load/memory problems.  The external command buffers and
> service check result buffers can fill up, causing external agents (e.g.
> NSCA) to block when they attempt to write passive check results to the
> external command file.
>
> 3.  When the Nagios daemon is not running, external agents like NSCA,
> cannot write to the external command file, which either results in a
> blocking behavior or check results being lost.
>
> Proposed solution:
>
> The new method I am proposing is simple and straightforward.  Why I
> didn't implement something like this years ago is beyond me. :-)
>
> Instead of passing check results from child processes to the main Nagios
> process via two methods (pipe and file), I suggest that all information
> be written to files in a special check result queue directory (e.g.,
> var/checkresults).  Child processes that perform host/service checks can
> write all results to a file in the queue directory.  The main Nagios
> process will then periodically process all files/check results in the
> queue in a time-ordered fasion.
>
> This method is ideal for handling the problems with the current IPC method:
>
> 1.  When the Nagios daemon stops, child processes that are still
> performing host/service checks can write the results to the queue
> directory.  When Nagios starts up again, it will process all those
> results, so nothing was lost.
>
> 2a. Passive checks can still be submitted through the external command
> file.  In this case Nagios will not have to fork child processes - it
> will simply write the passive check results to the queue directory.
>
> 2b. Using a queue directory will allow external agents (e.g. NSCA) to
> submit passive check results by directly writing files in the queue
> directory without having to submit commands through the external command
> interface.  This should reduce the dependence on NSCA and allow for
> performance improvements in environments where there are a large number
> of passive checks.
>
> 3. When Nagios is not running, external agents like NSCA can write check
> results to the queue directory without worrying about blocking.  Nagios
> will process all check results when it starts up again.
>
> Any performance hits that may occur with the new IPC method due to disk
> thrashing can be minimized if the queue directory is placed on a
> memory-mapped filesystem.  Whether this will actually be necessary or
> not in all but the largest installations remains to be seen.
>
> I currently have half of the code implemented and can post working code
> to CVS within the next week.  I'm interested to hear what folks on the
> list think about the new method before I make the switch, as doing so
> will involve ripping out most of the current IPC code. Once I do so, I
> don't want to have to backtrack. :-)
>
>
>
>
> Ethan Galstad,
> Nagios Developer
> ---
> Email: nagios at nagios.org
> Website: http://www.nagios.org
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list