Nagios 2.0 performance

Marc Powell marc at ena.com
Sun Sep 12 18:40:41 CEST 2004
Previous message: Nagios 2.0 performance
Next message: ping output error in 1.4a plugins in Fedora Core 2
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Andreas Ericsson
> Sent: Sunday, September 12, 2004 9:25 AM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Nagios 2.0 performance


> > Unless I'm being dense, this is my understanding of just what the
event
> > broker is designed for.
> 
> Problems with the eventbroker;
> * It allows a module to schedule events, but not to receive them (if I
> read the example code correctly). This allows for the crude sort of
SQL
> used earlier which deletes and recreates an entire table in one go,
but
> not for anything more clever than that (like using a persistent db
> connection and just executing REPLACE statements for updated statuses
> every 10 seconds).

My understanding from what used to be on upcoming.php and several emails
is that a module would register to receive host and service data for
example, when a host or service check was completed, a copy of that
information would be used by nagios and a copy would be sent to the
registered module to do whatever it wanted with it. Whether the module
could maintain a persistent DB connection is unclear to me simply
because nothing about the event broker daemon has been documented yet.
Unless the devil is in the details, looking at module/helloworld.c,
include/nebcallbacks.h, include/nebstructs.h, base/broker.c, nebmods.c
and most importantly checks.c seems to clearly show to me that the event
broker sends status info to the modules as checks are completed, for
that check only, not schedule them and I don't see (with non programmers
eyes) why you couldn't create the DB connection in the nebmodule_init
function and break it down in the nebmodule_deinit functions that are
called at nagios start and nagios stop respectively. Search for
broker_service_check in checks.c and follow it from there if you like.

> * Sloppy code in the eventbroker may damage stability of the core
nagios.

Are you speaking about the event broker itself or a registered module?
If it's in the event broker, that would be considered a bug in Nagios
and should be treated as such. This is no different than sloppy code in
the check logic doing the same. If a module, evolutionary theory would
state that such a module wouldn't be around long or see widespread use
unless the author cleaned it up.
 
> * Messages still have to be formatted using snprintf (or the
crash-prone
> sprintf, since snprintf isn't available on all systems). The function
I
> submitted took care of this 'in-house', using va_arg (works with
std=c89).

I'm not a programmer by trade so I can't address this. 

> * The eventbroker is extremely poorly documented, and won't even build
> without patching on anything less than glibc 2.3 (in fact, making
Nagios
> 2.0 build at all on glibc 2.1 requires some patching).

I would expect that's because documentation isn't complete yet as has
been clearly stated a number of times. There are many things about 2.0
that are poorly documented at this point in time. That is one of the
reasons it hasn't been released as beta yet. As far as building on glibc
< 2.3, I can't speak to that either other than to say that a few months
ago I successfully built 2.0 on a redhat 7.3 box with 2.2.5 with no
problems.

> * There's no guarantee an eventbroker anybody writes will work with
> future versions of Nagios. The entire system seems to be designed to
let
> everybody plug in their own version without ever letting any of them
in
> to the core-tree.

I'm not entirely clear on what you mean here. Sure, nagios checks the
API version of the module to make sure that it matches it's API version
but if the API is set and doesn't change between versions of nagios then
there's no problem. If the API does change, you make whatever changes
are necessary to be compatible, which may be as simple as updating your
API number to match if the changes to the API didn't affect your module.
Of course, the hope is that once it is set, the API _doesn't_ change
unless new status reporting features are added. I believe that the
status formats have been stable for a very long time. I see no reason to
believe that it would change between minor or even major versions.  

> * After about 4 months, not a single eventbroker has been written that
> I'm aware of. This suggests people don't like it all too much. A sure
> enough sign it probably won't get very big.

Ummmm.. Where does your 4 months come from? a) The API isn't documented
yet and presumably is still subject to change, b) nobody knows exactly
what the event broker is capable of, as evidenced by this conversation
because it's not documented yet and c) 2.0 is *alpha* and hasn't been
encouraged to be used in general yet. I suspect that's more the problem
than lack of interest. That's like saying nobody likes the Avalon
interface in Longhorn because nobody has written any programs for it
yet. I personally have been waiting for the documentation to be complete
and be at least at the beta stage before digging in.

> * It interacts poorly with other languages. Most of the Nagios
community
> seem to be perl/shell-scripters rather than C-programmers, so
> development is left to the precious few who know their way properly
> around C.

It seems logical to me for speed, scalability and consistency. Nagios
and the core plugins are written in C, why should this be different? I
can think of several simple ways of tying into perl or other languages,
including the pipe/socket you've mentioned previously that even I could
probably write as a module. Two examples --

   a)
	On init, register for service data, host data, create/attach to
socket, pipe, whatever...
	On data receipt, format and send data to socket/pipe IFF
socket/pipe is not full else skip or alert or whatever.
	On exit, close socket/pipe

      You now have a pipe that a perl daemon can watch and process the
data from.

   b)
	On init, register for service data, host data, etc
	On data receipt, execute perl script passing args
	On exit, exit.

Obviously (a) is a much better way but both are logically simple.


> * Debugging a module is pure hell, since it loads into another
programs
> PTE.
> 
> So what could be gained by adding support for logging to a socket?
> * Easy integration with a plethora of other languages.
> * Logging to remote servers is made extremely easy.
> * Very simple code allows for redundant monitoring systems with
> heartbeat failover.
> * Nagios core remains untroubled if the listening end of the socket is
> completely bug infested.
> * The listening end won't have to fork each time a message arrives.
> Nagios' parent process can maintain a persistent connection throughout
> its entire lifetime.
> * Debugging the listener is simple, since it runs its own code in its
> own process.

Make it a module. See above. 

> 
> > The logging module (or db module or whatever
> > someone writes) registers with the event broker to receive certain
types
> > of status data or all status data in a well documented format, when
> > documentation is completed of course,
> 
> Has anybody seen any indication of this popping up somewhere?

What? The documentation or the module? I would think that the
documentation would come first. See above.
 
> > and the rest flows naturally as
> > you described above. No messing with the core,
> 
> A module shares PTE with its loader, so messing with the core can't be
> avoided as it is today.
> 
> > no worries about
> > backporting for upgrades,
> 
> As long as all the functions are still available and accept the same
> amount and types of variables, and as long as the data structures
> doesn't change at all. Hmmm... somehow, I don't think Ethan will
choose
> to humour a wide variety of eventbrokers before adding new
functionality
> to Nagios.

See above. New functionality additions that change the format of the
status data have been few and very far between.

> 
> > and people are free to do whatever they want
> > with the data, including storing it in whatever format they want
> 
> Yes, but they have to schedule an event to fetch it first. The
> log-message won't arrive at their doorstep when it's available (which
> would keep system load at a minimum). Instead, a (possibly)
> CPU-consuming function needs to run in the Nagios parent process.

No, it arrives immediately after the check completes. Look at the code
in checks.c. Check completes, nagios updates internal status, nagios
sends to modules registered for that type of data in order of requested
priority. There is no scheduling of events. On initialization of the
module it tells nagios 'hey, I want to see service check and host check
data when you do'. It's as simple as that.

--
Marc



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Nagios 2.0 performance
Next message: ping output error in 1.4a plugins in Fedora Core 2
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list