New Nagios implementation proposal

Andreas Ericsson ae at op5.se
Mon Dec 14 13:37:17 CET 2009


On 12/11/2009 04:30 PM, nap wrote:
> On Fri, Dec 11, 2009 at 1:53 PM, Andreas Ericsson<ae at op5.se>  wrote:
> 
>>
>> Process pools aren't that hard to do in C really, but altering the
>> entire concept of how Nagios operates is a fairly big change. OTOH, I'm
>> not thrilled about the whole "check-results are stored in tempfiles"
>> thing either, and *that* was a major change too.
> Maybe we can first work in the "return in socket/memory" before try
> the process pool. It must be easier and can have very huge effect.
> 

That would be easier, yes. I once did a test of multiplexing check
results and had very good results with it. The only problem is that
it would require a double-fork() now, as checks would have to be
wrapped in something to provide correct output with the microsecond
execution time precision Nagios currently uses.

>>
>> Jean, let's discuss how we can move this forward within the C-code
>> in such a way that we retain compatibility on all levels. Too many
>> have invested too much in Merlin, NDOUtils and other C-based addons
>> to relinquish them easily, and splitting the community again would
>> be really, really stupid.
> I'm agree with it. But I also think we cannot avoid a lot of years a
> re-factory in order to use new tools like distributed object
> technologies or dynamic development (you create properties for your
> object, so you cut a lot part of your code). I know we can make greats
> things in C. We will make great things in C for V4. But we must think
> about long term development too.
> 

Well, we could probably rewrite Nagios from scratch in a lot less than
a year. Like most great things, it's not the implementation that's so
spectacular but the idea behind it that is brilliant.

I have no idea what you mean by "dynamic development". It's a hypeterm
that can mean anything from "we let quality fluctuate wildly" to "we
never really know what features the next release will hold". It's
hardly ever anything good anyways.

> I propose 2 things:
> *we list ideas in Shinken absent of Nagios (process pool, return in
> socket/memory, new options for services like inverse_ok_critical or
> critical_is_warning) or Merlin (the "automatic cutting"
> function/dispatching function) and we watch how put them into the
> current code for the v4 for Nagios (next year? :) ) and v1 for Merlin.

"process pool" and "return in memory" are not features. They're
implementation details. What we need to do is to decide on a few
problems in Nagios and work on them.

One such problem is the rather monolithic functions that have far
too many side-effects, without any clear API's that modules can use
to safely modify objects while Nagios is running. Refactoring that
into manageable (and testable) pieces would be a worthwhile goal in
and of itself.

> *we open a "lab" or "long-term-dev" branch where we test things
> without fearing of breaking the current modules. With such a branch,
> everyone can begin to test and hack the code, see how it work, and
> slowly redo everything that is done in the current code. It will call
> new developers who are affrayed by C (yes, they are some :) ) so It
> will not divided efforts on the main code. If this branch is a
> success, we can put ideas from it to the main code, and try to make a
> mix of theses branchs like you propose just above.
> 

I'd actually prefer if new features are created on their own topic-
branches so that each individual topic can be merged on its own rather
than as a mass of co-dependant topics. Ofcourse, some topics will be
co-dependant no matter what we try. In particular those who rely on
API's introduced in some other topic, ofcourse.

> With this solution, community will not be divided in two, we will have
> a "pool of ideas" branch and if it stabilizes in the long term, maybe
> a good mix of the two worlds and give time to every one to peek into
> and see how it work and if it can be used in some situations (like on
> Windows for small environments) for testing.
> 
> The main difficulty will be to keep the lab not too far from the main
> branch, but with a common git, it must be easier than a fork or
> something like that.
> 

Yes, probably. Although I'm still sceptical about implementing parts
of it in Python.

>>
>> Would it for example be possible to use Shinken as the checking
>> engine that supplies check-results back to a C-based scheduler
>> that retains config parsing and module compatibility? If that's
>> the case, we might be on to something. Otherwise, we'd better get
>> busy re-writing parts of the Nagios core to implement a process
>> pool.
> The "orders" for pollers are send with Pyro, a full python module. I
> know we can load C code into Python, but it must be possible to load
> Python into C. But this part of Shinken is not the more important. For
> C Pool, we can watch for DNX (it's threads but if we remove XML from
> it, it can be fast, isn't it? ).
> 

It's definitely possible to load Python into C. That's what the Python
interpreter does, after all.

Imitating DNX is one plan ofcourse. Or we simply introduce a short
binary protocol for the checking daemons to report their check-results
back to Nagios. It's immensely simple and super-efficient. Especially
since plugins only report one chunk of data as its output so only one
pointer has to be recalculated with some really simple arithmetic.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev




More information about the Developers mailing list