Threaded reaper patch

Adam Augustine augustineas at gmail.com
Wed Aug 17 01:26:48 CEST 2011


On Mon, Aug 15, 2011 at 7:25 AM, Andreas Ericsson <ae at op5.se> wrote:

> On 08/09/2011 09:13 PM, Adam Augustine wrote:
> >
> > But in spite of that, it seems that moving the reaper code into a thread
> > would be generically useful for Nagios. I know it has been discussed on
> this
> > list in the past.
> >
>
> It would also cause a bunch of problems. What we're working on instead is
> implementing worker processes which communicate with a master process via
> a unix socket. One such process could act as a (mostly dormant) reaper for
> the checkresult files in the spool directory.
>
>
Ah, it seems the scope of the worker process socket effort is much larger
than I had expected. Does this mean that modules that were initially NEBs
can instead be implemented as wholly independent processes, communicating
back over that socket (presumably more than just a unix domain socket, but
also a network socket as well)?




>  > If the Merlin reaper thread is wholly contained within the Merlin NEB
> (as it
> > appears to be) and is not in any way patching the Nagios core code, then
> my
> > question is, how is that working without conflicting with the main event
> > loop reaper code?
>
> Mainly by making Nagios itself threadsafe all API's the broker module uses.
> That's why Merlin needs Nagios 3.3.1 or one of the post-3.2.3 versions made
> available through git.op5.org
>
>
Ah, so there are modifications necessary to pre-3.3.1 versions of Nagios to
override the reaping process. Nagios 3.3.1 now has real (and threadsafe)
APIs for manipulating internal data structures, where before there weren't
any. This makes perfect sense to me. The Merlin reaper thread uses the same
API to update the in-memory data structures that the main event loop reaper
code would, so no conflicts.


>  > My quick glance at the NEB callbacks for
> > EVENT_CHECK_REAPER seems to indicate that there isn't any
> > NEBERROR_CALLBACKOVERRIDE associated with it. So I am very curious how it
> is
> > being handled.
> >
>
> You're talking about two different reapers. They don't interfere with
> each other at all.
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
>

I think I understand now, presuming that the Merlin reaper and the main
Nagios event loop reaper are both using the new thread safe APIs.

But I am still a little confused. You mention above that implementing the
reaper code as a Nagios thread would cause a lot of problems, but isn't that
what the Merlin NEB module does? Are you encountering a lot of problems with
that approach? Or was it specifically the /moving/ the reaper into a thread
that you thought was a bad idea?

I certainly agree that socket communication provides a much cleaner
separation, and would make things easier, and I am not advocating

But separate thread or separate process is really an implementation detail
(an important one, admittedly, but still).

My base assumption is that, the single threaded nature of Nagios core is
slowed significantly by the time spent in the reaping portion of the loop.
Evidence supporting that assumption is the fact that we have a timeout
associated with that portion of code. Assuming the default of 30 seconds is
"sane" then the reaper could spend up to 30 seconds blocking checks from
being executed, and significantly impacting check_latency.

Anyway, for a larger number of checks (50K-100k), I would think a reaper
implemented as a worker process (or thread, or whatever) would be very busy
processing all the results coming into the checkresults files in the spool
directory and updating the relevant in-memory data structures. But based on
your statement above (the "mostly dormant" part), it would seem that I am
wrong somewhere.

What am I missing?

Thanks for your time in answer my questions. I have spent some time looking
through the code and usually end up with more questions than answers on the
internals of how Nagios is handling things.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20110816/2b57c15d/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list