New Nagios implementation proposal

nap naparuba at gmail.com
Thu Dec 17 22:00:17 CET 2009


_________________________
< Was theduck typing too much? >
------------------------------
\
 \
  \ >()_
     (__)__ _



 _______________________________
< Or maybe my English is just too bad? >
 --------------------------------------
   \
    \
    ____
   /# /_\_
  |  |/o\o\
  |  \\_/_/
 / |_     |
|  ||\_ ~|
|  |||   \/
|  |||_
 \//  |
  ||  |
  ||_  \
  \_|  o|
  /\___/
 /  ||||__
    (___)_)


but there was still one person to do not answer to my mail. It was my
birthday some days ago, but still no mail, maybe I will get one for
Christmas?
       !
   -~*~-
      /!\
   /%;@\
 o/@,%\o
  /%;`@,\
o/@'%',\o
'^^^N^^^`
  [+OK ready\nUSER naparuba\nPASS blabla] <- my email get in pop under
the christmas tree.


Maybe we can start a new survey for making Ethan reading his emails? :)

More seriously, today I code a new database managment : couchdb. In
just 300 quite easy lines (with a big part common with the mysql
managment), it's was good (couchdb "databases" are the same than
Merlin tables). In some few hours, it's was good.


Jean

PS: after Obiwan Kenobi and this email, It will be hard to find
something more special to make people read this conversation :)


On Tue, Dec 15, 2009 at 10:53 AM, nap <naparuba at gmail.com> wrote:
> On Mon, Dec 14, 2009 at 1:37 PM, Andreas Ericsson <ae at op5.se> wrote:
>> On 12/11/2009 04:30 PM, nap wrote:
>>> On Fri, Dec 11, 2009 at 1:53 PM, Andreas Ericsson<ae at op5.se>  wrote:
>>>
>>>>
>>>> Process pools aren't that hard to do in C really, but altering the
>>>> entire concept of how Nagios operates is a fairly big change. OTOH, I'm
>>>> not thrilled about the whole "check-results are stored in tempfiles"
>>>> thing either, and *that* was a major change too.
>>> Maybe we can first work in the "return in socket/memory" before try
>>> the process pool. It must be easier and can have very huge effect.
>>>
>>
>> That would be easier, yes. I once did a test of multiplexing check
>> results and had very good results with it. The only problem is that
>> it would require a double-fork() now, as checks would have to be
>> wrapped in something to provide correct output with the microsecond
>> execution time precision Nagios currently uses.
>
> I don't understand the double-fork problem : whereas writing a flat
> file, the son who popen the check just open a socket to the nagios
> main process. Unstead of micro-sleep, nagios must select (just timeout
> instead of sleep) the socket. It must put in queue the result for
> reaping or maybe direct reap this result.
>
>>
>>>>
>>>> Jean, let's discuss how we can move this forward within the C-code
>>>> in such a way that we retain compatibility on all levels. Too many
>>>> have invested too much in Merlin, NDOUtils and other C-based addons
>>>> to relinquish them easily, and splitting the community again would
>>>> be really, really stupid.
>>> I'm agree with it. But I also think we cannot avoid a lot of years a
>>> re-factory in order to use new tools like distributed object
>>> technologies or dynamic development (you create properties for your
>>> object, so you cut a lot part of your code). I know we can make greats
>>> things in C. We will make great things in C for V4. But we must think
>>> about long term development too.
>>>
>>
>> Well, we could probably rewrite Nagios from scratch in a lot less than
>> a year. Like most great things, it's not the implementation that's so
>> spectacular but the idea behind it that is brilliant.
> Yes
>
>>
>> I have no idea what you mean by "dynamic development". It's a hypeterm
>> that can mean anything from "we let quality fluctuate wildly" to "we
>> never really know what features the next release will hold". It's
>> hardly ever anything good anyways.
> <Warning> Python code just below :) </warning>
>
> Believe me, I do not use this term in a marketing way. I just HATE the
> marketing : you thing buy the best tool of the world, and in fact it
> just do nothing the guy who sent it to you says. Here the dynamic is
> not for the dynamic of the project or something like this. It's just
> the capacity of Python for code introspection.
>
> You can "attach" arrays in classes. You can also access class of an
> object just by object.__class__. I use this in the macro resolver
> part. I use one function to resolv a command, it take the command line
> (with macros) and a list of object. It just do not care about with
> object it is, it can be host, service, contact or whatever you want.
> Let called this list "the context". Importants classes like hosts,
> services or contact have a macros arrays : it list available macros
> for the type and for each macro the property of the object that have
> the information. For host we've got for example :
> macros = {'HOSTADDRESS' : 'address',
> [...]
> 'TOTALHOSTSERVICESOK' : 'get_total_services_ok',
> }
> The function for the macro resolver just do:
> for macrosearch in the_context:
>    for object in list:
>        for macro in object.__class__.macros:
>             if macro == macrosearch:
>                property = object.__class__.macros[macro]
> Ok, here we find the object that have the macro and we find the
> property of this object that have the information. For a simple
> property like $'HOSTADDRESS$, the value will be:
>                value = getattr(object, 'property')
> getattr is a Python function you can use if you want a property of an
> object, but you don't know at the coding phase with one, so at running
> time :
> getattr(hst, 'address') = hst.address
>
> For complex macros like TOTALHOSTSERVICESOK who is not a simple static
> property, the macro resolver check if the property is "callable" (is a
> function). if so, it just call it, the value will be the return of the
> function. Here the hst.get_total_services_ok() just return the number
> of ok services it gots. So it is:
>                value = getattr(object, 'property')()
>
> Is it not totally easy, it truly advanced python functions.
> But now how do you add a new macro for an object? You just add an
> entry in the macros arrays of the class. And that all! You do not have
> to modify the macro resolver code. Your macro is just defined one
> time, no duplication.
>
> The macro resolver can be called by host check (no service), or
> service check, or notification. It just depend of the command line you
> want to resolve and the context.
> It is what I call dynamic : you described an object, and all
> operations are made at runtime with using this description, not
> hardcoded loop. And the code I wrote is nearly python code, simple, no
> { or ;  ;)
>
>
>
> The same logic is used in the objects creations : an array is in the
> class named properties. It describes the properties that an object can
> have. We've got for example for hosts:
> properties = {
> [..]
> 'retry_interval': {'required': False, 'default':'0', 'pythonize':
> to_int, 'status_broker_name' : None},
> [...]
> }
> Here the retry_interval properties is says to be:
> *not required : if the property is not defined, it is not a problem,
> the 'default' value will be take.
> *default : the default use if not specified (and not required)
> *pythonize : how to transform the "string" read from conf file to a
> real object, here to_int is a function that take a string and change
> to a int.
> *status_broker_name : send to broker with a different name if not None
>
> In the code of config read, transformation to objects, all
> inheritances, default values, there is no mention of retry_interval.
> All code are properties' loops. The properties are described in this
> array, and just here. You want to add a property for host? Just add a
> line in this array. That all. All grok reading, inheritance is made
> "dynamically".
>
>
>
> An other "dynamic magic" is used for modules : you just do not care
> about the object you managed. It's the duck typing (if it quacks like
> a duck, it must be a duck). There is no limitation of compilation like
> fixed structure. A module can called the address property of an
> object. It just don't care about all 30 others properties. Why a
> module cannot be load if it just use the address because you add a new
> property in the object? The module just don't care. It's a linking
> problem (structure change). Why dynamic programming language no
> problem : it just don't care about it. The module want to load an you
> change the structure? No problem. If a property was removed, the
> module will have an exception that it can catch if it want (I use it
> in the broker code : if a module raised an exception, I just deload
> the module).
>
> Remember what we do for the parent hosts patch : I need a new temp
> property to tag if the host was already checked. For keep module
> loading ok, we put this property in the higher bits of a bool (in fact
> an int32). In Python, you just add with :
> hst.dfs_loop_check = 'OK'
> And you remove it by :
> del hst.dfs_loop_check
>
> In this dynamic view, and object is seen like an array : you can add
> or removed properties at runtime. In fact, properties of an object ARE
> in a array (object.__dict__) :). If you add introspection
> functionalities, you can really go from "hard code" in some way, to a
> likely "dava driven" development (you described your objects and that
> all). All of theses functionalities are add to all classic object
> programming (all objects have a lot of common code like inheritances
> or default filling. Even host and services have a lot of common code)
> that just make the code smaller.
>
> That is not perfect : no compilation mean no properties named checks
> (you can try to add 60 to all retri_interval... no it's
> retry_interval. Python will create retri_interval for all hosts :()
> but there a ways to avoid it.
>
> You do not have to code like this in Python. In fact you can code like
> you already do in C. But if you began to uses theses functionalities,
> believe me, you will never came back :)
>
> For example for this easy programming, yesterday I add a new property
> for hosts and service : hot_period (service can be tagged to be "low
> priority" : critical -> warning, but in "hot_period", it stay in
> critical (like end of the month for financial service ;) )). It take
> me 2 minutes to add it! (1min to launch emacs, 1min to code ... ;) ).
>
>>
>>> I propose 2 things:
>>> *we list ideas in Shinken absent of Nagios (process pool, return in
>>> socket/memory, new options for services like inverse_ok_critical or
>>> critical_is_warning) or Merlin (the "automatic cutting"
>>> function/dispatching function) and we watch how put them into the
>>> current code for the v4 for Nagios (next year? :) ) and v1 for Merlin.
>>
>> "process pool" and "return in memory" are not features. They're
>> implementation details. What we need to do is to decide on a few
>> problems in Nagios and work on them.
> Yes.
>
>>
>> One such problem is the rather monolithic functions that have far
>> too many side-effects, without any clear API's that modules can use
>> to safely modify objects while Nagios is running. Refactoring that
>> into manageable (and testable) pieces would be a worthwhile goal in
>> and of itself.
>>
>>> *we open a "lab" or "long-term-dev" branch where we test things
>>> without fearing of breaking the current modules. With such a branch,
>>> everyone can begin to test and hack the code, see how it work, and
>>> slowly redo everything that is done in the current code. It will call
>>> new developers who are affrayed by C (yes, they are some :) ) so It
>>> will not divided efforts on the main code. If this branch is a
>>> success, we can put ideas from it to the main code, and try to make a
>>> mix of theses branchs like you propose just above.
>>>
>>
>> I'd actually prefer if new features are created on their own topic-
>> branches so that each individual topic can be merged on its own rather
>> than as a mass of co-dependant topics. Ofcourse, some topics will be
>> co-dependant no matter what we try. In particular those who rely on
>> API's introduced in some other topic, ofcourse.
> Yes we can change the current implementation with small topics. But we
> must also speak about the long term future. I still think we must
> re-organise Nagios code to be modular. It will break module
> compatibility (at least a recompilation). But why not try to see if
> others language than C can be used? I am not saying we must use
> Python, but C for a scheduler is not the best language. Scheduling is
> a high level problem. Lets use all tools we can for solve it.
>
>
>>
>>> With this solution, community will not be divided in two, we will have
>>> a "pool of ideas" branch and if it stabilizes in the long term, maybe
>>> a good mix of the two worlds and give time to every one to peek into
>>> and see how it work and if it can be used in some situations (like on
>>> Windows for small environments) for testing.
>>>
>>> The main difficulty will be to keep the lab not too far from the main
>>> branch, but with a common git, it must be easier than a fork or
>>> something like that.
>>>
>>
>> Yes, probably. Although I'm still sceptical about implementing parts
>> of it in Python.
> And after I show you how dynamic programming can be useful? :)
>
>>
>>>>
>>>> Would it for example be possible to use Shinken as the checking
>>>> engine that supplies check-results back to a C-based scheduler
>>>> that retains config parsing and module compatibility? If that's
>>>> the case, we might be on to something. Otherwise, we'd better get
>>>> busy re-writing parts of the Nagios core to implement a process
>>>> pool.
>>> The "orders" for pollers are send with Pyro, a full python module. I
>>> know we can load C code into Python, but it must be possible to load
>>> Python into C. But this part of Shinken is not the more important. For
>>> C Pool, we can watch for DNX (it's threads but if we remove XML from
>>> it, it can be fast, isn't it? ).
>>>
>>
>> It's definitely possible to load Python into C. That's what the Python
>> interpreter does, after all.
>>
>> Imitating DNX is one plan ofcourse. Or we simply introduce a short
>> binary protocol for the checking daemons to report their check-results
>> back to Nagios. It's immensely simple and super-efficient. Especially
>> since plugins only report one chunk of data as its output so only one
>> pointer has to be recalculated with some really simple arithmetic.
> That can be quite easy :)
>
>
> Jean
>
>>
>> --
>> Andreas Ericsson                   andreas.ericsson at op5.se
>> OP5 AB                             www.op5.se
>> Tel: +46 8-230225                  Fax: +46 8-230231
>>
>> Considering the successes of the wars on alcohol, poverty, drugs and
>> terror, I think we should give some serious thought to declaring war
>> on peace.
>>
>> ------------------------------------------------------------------------------
>> Return on Information:
>> Google Enterprise Search pays you back
>> Get the facts.
>> http://p.sf.net/sfu/google-dev2dev
>> _______________________________________________
>> Nagios-devel mailing list
>> Nagios-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>>
>

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 



More information about the Developers mailing list