RFC: New IPC Method for Check Results

Ethan Galstad nagios at nagios.org
Thu Apr 12 06:20:01 CEST 2007


Hendrik Bäcker wrote:
> Ethan Galstad wrote:
>> Proposed solution:
>>
>> The new method I am proposing is simple and straightforward.  Why I 
>> didn't implement something like this years ago is beyond me. :-)
>>   
> Cause, you just wanted to begin your programmers way with a pipe?? *just
> kidding*
>> Instead of passing check results from child processes to the main Nagios 
>> process via two methods (pipe and file), I suggest that all information 
>> be written to files in a special check result queue directory (e.g., 
>> var/checkresults).  Child processes that perform host/service checks can 
>> write all results to a file in the queue directory.  The main Nagios 
>> process will then periodically process all files/check results in the 
>> queue in a time-ordered fasion.
>>   
> Some of us will remember my post about "a good way to handle performance
> data" with a small discussion about pipes vs. "spooldirs"?!
> In the actual release of the PNP Addon we have established a small
> daemon that does exactly what you wrote above.
> Short excurs: Nagios writes only files with perfdata, rotate them every
> x seconds to a spool dir, daemon reads the files and process them to
> fill the rrdfiles.
> This solution brought me from a latency around 350 Seconds ( ~ 2000
> Serviceechecks) down to 2-5 seconds.

Good to hear that you saw such improvements.  Hopefully this will have 
similar effects for passive checks...

> 
> Cause of this I would say: this is the right way.
>> Any performance hits that may occur with the new IPC method due to disk 
>> thrashing can be minimized if the queue directory is placed on a 
>> memory-mapped filesystem.  Whether this will actually be necessary or 
>> not in all but the largest installations remains to be seen.
>>   
> I would suggest to keep an eye on the number of files within a
> directory. I know some guys with a huge number of distributed nagios
> servers and a big amount of service checks.
> It might be bad if nagios dies for hours and on re-awakening to process
> thousand of single files if you think of using one file for each result.

I'll make sure that multiple results can be stored in a single file 
(ideal for bulk transfers using NSCA).  A configurable option will allow 
Nagios to process only results made within a certain timeframe. I think 
that should take care of it.

> 
> Just my 2 Cents.
> 
> Kind regards
> Hendrik
> 



Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list