Threaded reaper patch

Adam Augustine augustineas at gmail.com
Wed Aug 17 01:33:18 CEST 2011


Sorry for replying to myself. I sent that just a little too soon.

On Tue, Aug 16, 2011 at 5:26 PM, Adam Augustine <augustineas at gmail.com>wrote:

> On Mon, Aug 15, 2011 at 7:25 AM, Andreas Ericsson <ae at op5.se> wrote:
>
>> On 08/09/2011 09:13 PM, Adam Augustine wrote:
>> >
>> > But in spite of that, it seems that moving the reaper code into a thread
>> > would be generically useful for Nagios. I know it has been discussed on
>> this
>> > list in the past.
>> >
>>
>> It would also cause a bunch of problems. What we're working on instead is
>> implementing worker processes which communicate with a master process via
>> a unix socket. One such process could act as a (mostly dormant) reaper for
>> the checkresult files in the spool directory.
>>
>>
> Ah, it seems the scope of the worker process socket effort is much larger
> than I had expected. Does this mean that modules that were initially NEBs
> can instead be implemented as wholly independent processes, communicating
> back over that socket (presumably more than just a unix domain socket, but
> also a network socket as well)?
>
>
>
Here I meant that generically. The context of the original thread on the dev
list regarding the socket communication to the master process led me to
believe that it was specifically about offloading checks (ala DNX and
mod_gearman). The question I am asking is whether all NEB callbacks would be
implemented over the socket communication in the future.



>
>
>>  > If the Merlin reaper thread is wholly contained within the Merlin NEB
>> (as it
>> > appears to be) and is not in any way patching the Nagios core code, then
>> my
>> > question is, how is that working without conflicting with the main event
>> > loop reaper code?
>>
>> Mainly by making Nagios itself threadsafe all API's the broker module
>> uses.
>> That's why Merlin needs Nagios 3.3.1 or one of the post-3.2.3 versions
>> made
>> available through git.op5.org
>>
>>
> Ah, so there are modifications necessary to pre-3.3.1 versions of Nagios to
> override the reaping process. Nagios 3.3.1 now has real (and threadsafe)
> APIs for manipulating internal data structures, where before there weren't
> any. This makes perfect sense to me. The Merlin reaper thread uses the same
> API to update the in-memory data structures that the main event loop reaper
> code would, so no conflicts.
>
>
>>  > My quick glance at the NEB callbacks for
>> > EVENT_CHECK_REAPER seems to indicate that there isn't any
>> > NEBERROR_CALLBACKOVERRIDE associated with it. So I am very curious how
>> it is
>> > being handled.
>> >
>>
>> You're talking about two different reapers. They don't interfere with
>> each other at all.
>>
>> --
>> Andreas Ericsson                   andreas.ericsson at op5.se
>>
>
> I think I understand now, presuming that the Merlin reaper and the main
> Nagios event loop reaper are both using the new thread safe APIs.
>
> But I am still a little confused. You mention above that implementing the
> reaper code as a Nagios thread would cause a lot of problems, but isn't that
> what the Merlin NEB module does? Are you encountering a lot of problems with
> that approach? Or was it specifically the /moving/ the reaper into a thread
> that you thought was a bad idea?
>
> I certainly agree that socket communication provides a much cleaner
> separation, and would make things easier, and I am not advocating
>
> sticking with the NEB callback model.



> But separate thread or separate process is really an implementation detail
> (an important one, admittedly, but still).
>
> My base assumption is that, the single threaded nature of Nagios core is
> slowed significantly by the time spent in the reaping portion of the loop.
> Evidence supporting that assumption is the fact that we have a timeout
> associated with that portion of code. Assuming the default of 30 seconds is
> "sane" then the reaper could spend up to 30 seconds blocking checks from
> being executed, and significantly impacting check_latency.
>
> Anyway, for a larger number of checks (50K-100k), I would think a reaper
> implemented as a worker process (or thread, or whatever) would be very busy
> processing all the results coming into the checkresults files in the spool
> directory and updating the relevant in-memory data structures. But based on
> your statement above (the "mostly dormant" part), it would seem that I am
> wrong somewhere.
>
> What am I missing?
>
> Thanks for your time in answer my questions. I have spent some time looking
> through the code and usually end up with more questions than answers on the
> internals of how Nagios is handling things.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20110816/965080ae/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list