RFC: New IPC Method for Check Results

Ethan Galstad nagios at nagios.org
Wed Apr 11 18:20:53 CEST 2007


Based on issues that have come up in the past regarding the IPC method 
used by Nagios for handling host/service check results, I am proposing a 
  major change to how things are done with Nagios 3.

The current IPC method:

Active host/service check results are passed from child processes to the 
main Nagios process in two pieces: check information through a pipe, and 
plugin output through a temp file.

Passive check results must be fed through the external command file. 
Nagios then forks a child process and passes the check results to the 
main Nagios process in a similar fashion as with active checks.

Problems with the current method:

1. When the Nagios daemon stops, child processes may still be performing 
host/service checks.  The results of those checks are lost, which is not 
ideal.

2. Large numbers of passive checks (from distributed/redundant setups) 
can cause load/memory problems.  The external command buffers and 
service check result buffers can fill up, causing external agents (e.g. 
NSCA) to block when they attempt to write passive check results to the 
external command file.

3.  When the Nagios daemon is not running, external agents like NSCA, 
cannot write to the external command file, which either results in a 
blocking behavior or check results being lost.

Proposed solution:

The new method I am proposing is simple and straightforward.  Why I 
didn't implement something like this years ago is beyond me. :-)

Instead of passing check results from child processes to the main Nagios 
process via two methods (pipe and file), I suggest that all information 
be written to files in a special check result queue directory (e.g., 
var/checkresults).  Child processes that perform host/service checks can 
write all results to a file in the queue directory.  The main Nagios 
process will then periodically process all files/check results in the 
queue in a time-ordered fasion.

This method is ideal for handling the problems with the current IPC method:

1.  When the Nagios daemon stops, child processes that are still 
performing host/service checks can write the results to the queue 
directory.  When Nagios starts up again, it will process all those 
results, so nothing was lost.

2a. Passive checks can still be submitted through the external command 
file.  In this case Nagios will not have to fork child processes - it 
will simply write the passive check results to the queue directory.

2b. Using a queue directory will allow external agents (e.g. NSCA) to 
submit passive check results by directly writing files in the queue 
directory without having to submit commands through the external command 
interface.  This should reduce the dependence on NSCA and allow for 
performance improvements in environments where there are a large number 
of passive checks.

3. When Nagios is not running, external agents like NSCA can write check 
results to the queue directory without worrying about blocking.  Nagios 
will process all check results when it starts up again.

Any performance hits that may occur with the new IPC method due to disk 
thrashing can be minimized if the queue directory is placed on a 
memory-mapped filesystem.  Whether this will actually be necessary or 
not in all but the largest installations remains to be seen.

I currently have half of the code implemented and can post working code 
to CVS within the next week.  I'm interested to hear what folks on the 
list think about the new method before I make the switch, as doing so 
will involve ripping out most of the current IPC code. Once I do so, I 
don't want to have to backtrack. :-)




Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list