RFC: New IPC Method for Check Results

Marantz, Roy Roy.Marantz at deshaw.com
Wed Apr 11 23:45:48 CEST 2007


Maildir is such a big win over mbox that I also agree that this is a good idea.  I'd suggest you worry about the writer and read locking.  Do something that works on many implementations of file systems (i.e. avoid flock).  I believe that some similar code uses the file mode or name to intricate if it is being written.  Of course you then need to clean up files that are too old which are left over from crashed writers.
Roy

-----Original Message-----
From: nagios-devel-bounces at lists.sourceforge.net [mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf Of Hendrik Bäcker
Sent: Wednesday, April 11, 2007 12:42 PM
To: Nagios Developers List
Subject: Re: [Nagios-devel] RFC: New IPC Method for Check Results

Ethan Galstad wrote:
> Proposed solution:
>
> The new method I am proposing is simple and straightforward.  Why I 
> didn't implement something like this years ago is beyond me. :-)
>   
Cause, you just wanted to begin your programmers way with a pipe?? *just
kidding*
> Instead of passing check results from child processes to the main Nagios 
> process via two methods (pipe and file), I suggest that all information 
> be written to files in a special check result queue directory (e.g., 
> var/checkresults).  Child processes that perform host/service checks can 
> write all results to a file in the queue directory.  The main Nagios 
> process will then periodically process all files/check results in the 
> queue in a time-ordered fasion.
>   
Some of us will remember my post about "a good way to handle performance
data" with a small discussion about pipes vs. "spooldirs"?!
In the actual release of the PNP Addon we have established a small
daemon that does exactly what you wrote above.
Short excurs: Nagios writes only files with perfdata, rotate them every
x seconds to a spool dir, daemon reads the files and process them to
fill the rrdfiles.
This solution brought me from a latency around 350 Seconds ( ~ 2000
Serviceechecks) down to 2-5 seconds.

Cause of this I would say: this is the right way.
>
> Any performance hits that may occur with the new IPC method due to disk 
> thrashing can be minimized if the queue directory is placed on a 
> memory-mapped filesystem.  Whether this will actually be necessary or 
> not in all but the largest installations remains to be seen.
>   
I would suggest to keep an eye on the number of files within a
directory. I know some guys with a huge number of distributed nagios
servers and a big amount of service checks.
It might be bad if nagios dies for hours and on re-awakening to process
thousand of single files if you think of using one file for each result.

Just my 2 Cents.

Kind regards
Hendrik

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list