New Nagios implementation proposal

nap naparuba at gmail.com
Sun Dec 6 13:26:03 CET 2009


Hi,

My goal is clear : Keep the Nagios/Icinga compatibility with a
faster/modular implementation (and still fully open source of course).

Instead of offlist we can maybe go on in the icinga devel-list ?


Gabès Jean


On Sat, Dec 5, 2009 at 3:14 AM, Michael Friedrich <
michael.friedrich at univie.ac.at> wrote:

>  Hi,
>
> very interesting approach :-)
>
> Maybe we can talk offlist and in private about your goals and maybe joining
> forces with Icinga. How about that? :)
>
> Kind regards,
> Michael
>
> nap wrote:
>
> Hi list,
>
> I would like to have your feed back about a (unfinished)
> reimplementation of Nagios named "Shinken" I wrote in Python that is
> faster and more modular than the current Nagios implementation in C
> (yes faster, you read correctly. I was the first surprised by that).
>
> == The Shinken's history ==
> Few months, I start to work on a proof of concept for Nagios focus on
> distributed environments and performances. The main goal was to look
> for a distributed and high availability architecture. I was also
> thinking that Nagios' performances were quite good, but we can have
> more.
>
> For quick test and development, I used Python. I thought a process
> pool can make Nagios be quicker instead of forking a new process to
> kill it few seconds after for each checks. I also bypass the reaping
> way of Nagios : reading flat file is just too slow. Instead, the
> results are a structure that is send directly to the scheduler. No
> files, more performances. To be equal to Nagios, I add the same
> monitoring logic in the scheduler : HARD/SOFT states, dependencies
> (parents, servicedep, hostdep, etc) and database export (Merlin).
> Shinken used the standard Nagios conf file.
>
> And the perf are quite good : with a Nagios3,  a small check (do a
> echo + exit) and a medium range server I run at 10000 checks in
> 5minutes (latency near 1s), 30K with full tweaks. With my tool, I run
> 150K !!
>
>
> == The global architecture ==
> For the Architecture, I think we must use the Unix Way of doing things
> : one tool by usage. For now, Nagios do nearly every things : reads
> conf, schedule, launch checks and raise notifications. I try an
> architecture where the administrator can have any host/services he
> wants and the daemons are just resources to manage this. The
> architecture I propose is the following :
> *Arbiter : a daemon that read the configuration, cut it automatically
> (keep relations like parents in the same conf) in N confs, where N is
> the number of schedulers we have. It dispatchs the configuration and
> also read the orders in nagios.cmd and dispatch orders to schedulers.
> *Schedulers : do the scheduling by looking at states of
> hosts/services. It just do checks/notifications/event handlers queues
> for others daemons. Same things for event broker informations : it's
> just a queue.
> *pollers : use a processes Pool, get checks to launch in schedulers
> and returns results to schedulers.
> *reactionners : same than pollers, but for notifications and event handlers.
> *brokers : get event broker informations from schedulers and "do
> things" why them (like create the service-perfdata file, or fill
> databases).
>
> The poller way of doing is like DNX, nothing new here. The
> reactionners allow the administrators to have a unique daemon to send
> all notifications of all his schedulers (usefull for SMTP
> authorizations or the fill of a unique RSS file with all
> notifications). The schedulers do not launch checks, so they do not
> get latency when they launch notifications or event handlers.
>
> The load balancing is automatic : the arbiter cuts the conf and
> dispatch thems. For the high availability : there can be spare daemons
> : if a daemon die, another take it's configuration (the Arbiter "ping"
> daemons, and if a daemon failed, it just send the configuration to a
> spare). The daemon are reach by network, so all daemons can be in
> different servers (and it's better for high availability to not put
> all daemons in the same server :) ). For now, the Arbiter do not have
> a spare, but it will be add in the future.
>
> You can see this Architecture in the file shinken-architecture.png.
>
> If the user configuration do not defined such daemons, Shinken
> automatically create defaults one (in localhost with default ports).
>
> == Advanced architecture ==
> In the architecture we saw, all reactionners/pollers/brokers take
> orders from ALL schedulers. It can be a problem with reactionners
> (with 3 SMTP servers (USA, Europe, Asia), it's hard to forced Asia
> notifications to go in the Asia SMTP server). Same for poller : it
> polls checks to run, and get checks from a very distant scheduler can
> be very slow.
> To manage this, Shinken use a way of cutting the architecture : Realms.
>
> A realm is a pool of daemons that work togethers. A host is tag with a
> realm (and only one) so it will be managed by this realm's
> schedulers/pollers/reactionners/brokers. A realm can have sub-realms
> so you can put a reactionners in the higher Realm and it will managed
> all schedulers of sub-realms. A picture is worth a thousand words. You
> can have a better look of what realm is in the file
> shinken-architecture-global-realm.png.
>
> Same for daemons : if the user configuration do not defined realm, a
> default one is created by Shinken.
>
> == What is not managed by Shinken ? ==
> A lot of stuffs ! But the more important are regexp configurations,
> inherits_parents of hosts/services dependencies (always 1 in Shinken)
> and notification escalations. It also do not have exclude timeperiod
> support (like Nagios in fact ;) )
>
> The current implementation doc is athttp://wiki.nagios-fr.org/nagios/shinken/start in french. I am writing
> the english documentation, and it will be it's primary language in the
> future.
>
> == What is managed ? ==
> All classics stuffs are managed (SOFT/HARD, complex inheritances,
> volatile services, freshness, timeperiods with no exclude, flapping
> states...). It also have NDO and Merlin database support in MySQL. It
> also have NDO support with Oracle (yes, like Icinga)!! The NDO support
> is not full, some objects are not managed (like notifications) but
> it's not difficult to add them. It also supports UTF8 names.
>
>
> ==How do I test this freaking tool? ==
> Just get the VirtualBox VM at http://www.megaupload.com/?d=57BGSL09
> (yes, there can be a legal file in megaupload :) ). It's in OVF format
> so you need to import it with Virtual Box.
> It's a Ubuntu-server with DHCP nic, the account is shinken/shinken.
> You can launch all daemons with:
> ./launch_all.sh
> and kill all with :
> ./stop_all.sh
>
> Look at the small README file to see how to watch output of daemons
> (tail -f debug files). The current configuration is quite small (1500
> services) so it will run with no problem. You have a Ninja interface
> at http://IP_OF_THE_VM/ninja with monitor/monitor to watch the work.
> Warning : Ninja do not seem to see more than one instance_id in
> database, so you will see only half of hosts/services You can remove
> one of the schedulers in etc/conf.cfg : all hosts will be add
> automatically in the last active scheduler :)
>
> You can test your current Nagios conf with Shinken, It will create
> daemons configuration if need.
>
>
> If you want to install it from scratch, it's not so difficult :
> Shinken just need:
> *python-2.6
> *pyro (a Python module like Corba)
> *python-graph-core (on Ubuntu : sudo apt-get install python-setuptools
> && sudo easy_install python-graph-core). I will drop this dependancie
> soon (I just use a loop check, so a module for it is just too much...)
>
> You can get the code with :
> git clone git://shinken.git.sourceforge.net/gitroot/shinken/shinken
>
> Remember to change etc/nagios.cfg and etc/conf.cfg with your directory
> and, optionally, in conf.cfg the "plugin" object to put your ndo or
> merlin database user/pass/database. You just need to launch in
> shinken/src (here with 5 shells, no daemon for easy test):
> python shinken-scheduler.py
> python shinken-poller.py
> python shinken-reactionner.py
> python shinken-broker.py
> python shinken-arbiter -c etc/nagios.cfg
>
> == And now ?==
> The proof of concept became a new implementation : it's now easier to
> add missing features of Nagios into shinken than port features of
> Shinken into the current Nagios.
>
> I try to speak about this new implementation to some of this list
> directly but they do not seem to be very kind of it. I understand
> easily: just the processes pool is a hard work in C (and we cannot
> take Apache code for it, not the good licence :( ) and it will change
> a lot of Nagios internals. Change the reaping process by a socket is
> quite hard too.
>
> Yes, it breaks nearly everything, I know. It's not binary compatible
> with event broker modules (merlin, ndo, live status) but I think
> Nagios must evolve quicker that it does currently. Zenoss's evolution
> is very impressive. Current Nagios implementation in C is good (it
> does the work from the last 10 years!!). But like the drop of the old
> CGI interface with PHP (Ninja in fact, because the new Nagios XI
> interface is just not open source at all), we must keep all ideas of
> what Nagios is (hosts, services, configuration with inheritances,
> timeperiods) and put them in a new tool with a high level language.
>
> I think C is not always the good language for tools. If we are afraid
> of making a new architecture just because managing sockets/IPC is too
> hard : we must change the language.
> If the idea of dropping the old fork/fork/reaper way by a new one
> based on processes pool and direct return in memory make you do
> nightmares, we must change the language.
> If the idea of a Zenoss began the new reference in OSS monitoring tool
> just make you even worse nightmares : we must evolve quicker, so we
> must change the language.
>
> An example : for adding a new property in a Nagios object in the
> current C code, we must add it in numerous files (config file reading,
> object creation and so on). With a higher language like Python, it
> just need ONE line and everything is managed after (inheritance,
> object creation, default value, transformation from string to real
> value like int or list of values).
>
> == What I propose ? ==
> It's just a Big Bang proposal : I propose Shinken to be the
> development branch for Nagios core 4.
>
> I think with help and tests, we can put all that Shinken do not do
> that Nagios do and even more : we have an high availability
> distributed and flexible Architecture. We can think of a new way of
> getting information : the daemons have a HTTP server include (thanks
> Python) and we put a REST interface for getting informations and
> Setting orders (easier than nagios.cmd, especially in OS where there
> are no named pipes :)).
>
> I know some people will not be happy with it, and I don't ask to
> forgot the current C implementation and put in production the new one
> in one week. I do not want to fork Nagios. But I will make Shinken a
> reality. I prefer it's name to be Nagios4. I will not allow this
> freaking goods ideas of hosts, services, timeperiods, checks and
> configuration inheritance became history just because we cannot evolve
> like the others.
>
> Darwin law is against us, make it be in our side.
>
> == One last killing feature ==
> One other good things about this implementation : it just run
> everywhere Python runs, this including Windows!! I run Shinken in a
> Seven VM with no problem. It can be very usefull for SMEs : they are
> afraid about installing a Linux because they do not have an IT
> administrator that know it. With a Windows support, it will allow
> Nagios to enter in such enterprises.
>
> Nagios usually do middle range monitoring : it manage IT from 20 to
> 300 hosts. With this new implementation, it will also easily manage
> very small one to trully huge one (10000+ hosts in one node).
>
> So, what now?
>
>
> Gabès Jean
>
>
>
> ------------------------------
>
>
>  ------------------------------
>
>  ------------------------------
>
> ------------------------------------------------------------------------------
> Join us December 9, 2009 for the Red Hat Virtual Experience,
> a free event focused on virtualization and cloud computing.
> Attend in-depth sessions from your desk. Your couch. Anywhere.http://p.sf.net/sfu/redhat-sfdev2dev
>
> ------------------------------
>
> _______________________________________________
> Nagios-devel mailing listNagios-devel at lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>
>
> ------------------------------------------------------------------------------
> Join us December 9, 2009 for the Red Hat Virtual Experience,
> a free event focused on virtualization and cloud computing.
> Attend in-depth sessions from your desk. Your couch. Anywhere.
> http://p.sf.net/sfu/redhat-sfdev2dev
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20091206/700ec2a1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 133926 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20091206/700ec2a1/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 112205 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20091206/700ec2a1/attachment-0001.png>
-------------- next part --------------
------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list