Nagios 2.0 performance

Andreas Ericsson ae at op5.se
Sun Sep 12 16:24:54 CEST 2004


Marc Powell wrote:
>>>Status should be stored within nagios, and the cgi's should query
>>>nagios (not the log file) for status.
>>>OR
>>>Status should be stored in a database.
>>
>>I've mentioned this about a hundred times. If support for logging to a
>>socket in a straightforward and well documented way people wouldn't
> 
> have
> 
>>to fiddle with the core to develop clever way of logging status. Once
>>proper database-logging has been implemented, I'm sure some SQL-guru
> 
> can
> 
>>hack up a couple of superfast queries and donate them to the local
>>PHP-freak (php has the fastest hashes and best web-coding features
>>around, period), and web frontends should start popping up all over
> 
> the
> 
>>place.
> 
> 
> Unless I'm being dense, this is my understanding of just what the event
> broker is designed for.

Problems with the eventbroker;
* It allows a module to schedule events, but not to receive them (if I 
read the example code correctly). This allows for the crude sort of SQL 
used earlier which deletes and recreates an entire table in one go, but 
not for anything more clever than that (like using a persistent db 
connection and just executing REPLACE statements for updated statuses 
every 10 seconds).
* Sloppy code in the eventbroker may damage stability of the core nagios.
* Messages still have to be formatted using snprintf (or the crash-prone 
sprintf, since snprintf isn't available on all systems). The function I 
submitted took care of this 'in-house', using va_arg (works with std=c89).
* The eventbroker is extremely poorly documented, and won't even build 
without patching on anything less than glibc 2.3 (in fact, making Nagios 
2.0 build at all on glibc 2.1 requires some patching).
* There's no guarantee an eventbroker anybody writes will work with 
future versions of Nagios. The entire system seems to be designed to let 
everybody plug in their own version without ever letting any of them in 
to the core-tree.
* After about 4 months, not a single eventbroker has been written that 
I'm aware of. This suggests people don't like it all too much. A sure 
enough sign it probably won't get very big.
* It interacts poorly with other languages. Most of the Nagios community 
seem to be perl/shell-scripters rather than C-programmers, so 
development is left to the precious few who know their way properly 
around C.
* Debugging a module is pure hell, since it loads into another programs PTE.

So what could be gained by adding support for logging to a socket?
* Easy integration with a plethora of other languages.
* Logging to remote servers is made extremely easy.
* Very simple code allows for redundant monitoring systems with 
heartbeat failover.
* Nagios core remains untroubled if the listening end of the socket is 
completely bug infested.
* The listening end won't have to fork each time a message arrives. 
Nagios' parent process can maintain a persistent connection throughout 
its entire lifetime.
* Debugging the listener is simple, since it runs its own code in its 
own process.

> The logging module (or db module or whatever
> someone writes) registers with the event broker to receive certain types
> of status data or all status data in a well documented format, when
> documentation is completed of course,

Has anybody seen any indication of this popping up somewhere?

> and the rest flows naturally as
> you described above. No messing with the core,

A module shares PTE with its loader, so messing with the core can't be 
avoided as it is today.

> no worries about
> backporting for upgrades,

As long as all the functions are still available and accept the same 
amount and types of variables, and as long as the data structures 
doesn't change at all. Hmmm... somehow, I don't think Ethan will choose 
to humour a wide variety of eventbrokers before adding new functionality 
to Nagios.

> and people are free to do whatever they want
> with the data, including storing it in whatever format they want

Yes, but they have to schedule an event to fetch it first. The 
log-message won't arrive at their doorstep when it's available (which 
would keep system load at a minimum). Instead, a (possibly) 
CPU-consuming function needs to run in the Nagios parent process.

> and
> writing entirely new CGI's, not that the last couldn't be done now. 
> 
> --
> Marc
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list