Nagios is dead! Long live Icinga!

Andreas Ericsson ae at op5.se
Fri May 8 09:19:54 CEST 2009


Hendrik Baecker wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> Andreas Ericsson schrieb:
>> Hendrik Baecker wrote:
>>> Andreas Ericsson schrieb:
>>> I hope so, I realy don't want to loose all my commits on icinga.
>> Ah, so you're using git already?
>>
> Yes, we are on git and I have to say I love it more from day to day. A
> SCM that really can handle branches and merges. Wonderfull and fits
> perfectly for bigger developer bases.
> 
>>>> I hope Icinga forks from that DSCM, and uses the same DSCM to
>>>> maintain their own code so that a future merge becomes as painless
>>>> as possible.
>>> Icinga forked from a cvs2git copy, so future merging might be possible.
>>>
>> It would even be trivial if that cvs2git repository is available for
>> cloning :-)
>>
> Call me insane, but I'm afraid to lose face with publishing a half ready
> code base, even if I know that I can lose it when you read my 1527
> commit messages *just kidding*.
>>> As I mentioned before. Code base will be opened in the next days as a
>>> pre-alpha, mega-super-testing phase ;)
>>> No, it's not only a political stunt to get Ethan to wake up again.
>> mega-super-testing is fine, so long as it's code :p
>>
> Be patient with my face ;)
> 
>> Including object sizes and layout? If so, you're doing something good
>> and something bad at the same time.
>> There's room for improvement in the objects in Nagios today. A lot of
>> integer flags can (and should) be removed and reworked into bitmasks.
>> That would make matching them against their dependencies and contacts'
>> options trivial
>>
>>   if (1 << host->state & contact->notify_options & host->notification_options)
>>     send_host_notification(host);
>>
> I know you are some kind of magician but primary goal is to be more open
> to be able to share the work on more shouldes.

Well, "magic" like the above is usually tucked away in a macro, but whatever.

>>> The next steps should be a open development of the blocking ndoutils
>>> and better db abstraction.
>> I know you've taken a look at merlin (though I don't know if you saw the
>> figures of the NDOUtils scalability testing we did). Any reason you chose
>> to move forward with NDOUtils? Or is the database schema chopped into a
>> more scalable form as well?
>>
> I am a huge amount of commits behind the actual version and I love your
> idea behind it.
> Correct me if I'm wrong but merlin wasn't designed as a better ndo2db,
> wasn't it?

No, that's true. It was designed as an event-transport mechanism. However,
it's really trivial to add support for *doing* something with the events
while they're in transit, so it made sense to add database support to it.

The way I see it, this makes perfect sense from a unix standpoint, since
* it keeps each component small and to the point
* each component can be ripped out and replaced with something else
* small components can easily be chained together, and the result is almost
  always that the total ends up being bigger than the sum of its pieces

> Up to know it looks more like a benefit for monitoring
> performance improvements like DNX (in the scope of delegate the checking
> work), in this thoughts I don't understand why I should have to use
> merlin to fill a database?

Well, take a look at what NDOUtils does.
1. An event is generated in Nagios.
2. The NDO module takes that event and sends it to the NDO daemon.
3. The NDO daemon picks up the event, parses it and inserts it into
   the database.


Now look at what merlin does.
1. An event is generated in Nagios.
2. The merlin module sends the event to the merlin daemon.
3. The merlin daemon sends the event to other merlin daemons.
4. The merlin daemon in the other end sends the event to a merlin module,
   which updates status inside Nagios accordingly.

Now, if you add database capabilities to this chain of events, you'll see
two very interesting things becoming possible.
The first is that we no longer need to use a separate module to insert
stuff into the database.
The second is that events will transparently filter up to several instances
of GUI databases, so that in a chain like this:


           brazil
           /   \
         peru  chile

peru and chile will both have their own GUI's (if they like), and brazil
will have the combined status of peru, chile and brazil in a single
database. That makes for ridiculously simple UI programming for huge
networks, and we already have that for free simply because Merlin is
such a good event transport layer. 

So that means we get:
1. An event is generated in Nagios.
2. The merlin module sends that event to the merlin daemon
3. The merlin deamon inserts the event into a database
4. The merlin daemon transmits the event to "brazil"
5. The brazil merlin daemon inserts the event into a database
6. The brazil merlin daemon sends the event on to the merlin module
7. The brazil merlin module updates the nagios status

If you want something like this to happen *without* having database
write access in the merlin daemon (admittedly, database stuff should
have been demand-loadable code instead of compiled in, but we were in
a hurry and it's configurable anyway), you'll have to pass the event
back into Nagios in such a way that it once again gets passed on to
event-broker modules, but then you end up with a big fat nastiness
in terms of cross-host infinite loops.

Let's say you instead have this setup:

     peru1 <-> peru2

Both peru1 and peru2 monitor the exact same things. The chain without
db capabilities in merlin go like this:

1. peru1 nagios generates an event
2. The event is sent to the broker modules.
3. The merlin module on peru1 sends the event to the merlin daemon on
   peru1.
4. The merlin daemon on peru1 sends the event to the merlin daemon on
   peru2.
5. The merlin daemon on peru2 sends the event to the merlin module on
   peru2.
6. The merlin module on peru2 updates Nagios' status in such a way
   that the event is sent to the broker modules.
7. Go to step 2, but replace peru1 with peru2 and vice versa.

See?

> Even if the ndoutils db scheme is a complex one I believe that it is
> possible to tweak it for better performance.

For "better" performance, yes, but never for acceptable performance.
The bad thing about it is that you *have* to join a lot of tables, and
so query response times do not increase linearly. In order to get
acceptable performance throughout the range of small to large to huge
networks, you need to make it scale linearly. What we saw was that some
query response times escalated quadratically with the number of objects
and events inside the database. Getting the status of 500 hosts with
1000 events in the database would take 5 seconds, perhaps. Getting the
status of 1000 hosts with 1000 events would take 25 seconds (not actual
timings, but you get the idea; 5x5=25, but it *should* have been 5x2=10).

> NDOUtils are old enough to have an established userbase already (NagVis
> and some other more or less good reporting tries for example).
> 
> In short, ndoutils is somewhat like a already givven solution for those
> who defines that using a database is mandatory.
> If you know someone who has a more simple but faster solution to get
> nagios data to a database without a bypass over another entity which
> already has another mission - let's talk about it.

But NDOutils already has the event transport mechanism. It just doesn't
do it across host boundaries. Think smaller components and you'll get the
bigger picture much better.

> NDO DB Scheme isn't a must have for me, but for the moment it's the best
> thing (ignoring the bottlenecks) to bring nagios data directly to a
> database.
> 

You can't ignore the bottlenecks, because those are what's making it not
work. It's like saying "Oh, this car is really, really great, but since
it doesn't have wheels we have to carry it along."

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Register now for Nordic Meet on Nagios, June 3-4 in Stockholm
 http://nordicmeetonnagios.op5.org/

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com




More information about the Developers mailing list