novel idea

Andreas Ericsson ae at op5.se
Mon May 9 09:57:32 CEST 2005


sean finney wrote:
> hi andreas,
> 
> On Sun, May 08, 2005 at 08:12:59PM -0700, nagios-devel-request at lists.sourceforge.net wrote:
> 
>>From: Andreas Ericsson <ae at op5.se>
>>Subject: Re: [Nagios-devel] novel idea for performance optimization
> 
> 
>>Good idea, except that ld linker voodoo (symbol resolution et al) 
>>induces the same or more overhead on systems with copy-on-write fork 
>>(linux, bsd, solaris) and reasonably quick context-switching (linux, 
>>bsd). So the suffering people are those running Nagios on HP and Cygwin. 
>>Not a great many, I presume.
> 
> 
> no, i think you're not understanding what i'm suggesting.  let me
> try and be more clear.  let's use check_tcp as an example.  when
> compiling check_tcp, in addition to the standalone binary, a shared
> object something like libnagios-check_tcp.so would be created.
> this shared object would have the symbol "main" renamed "check_tcp"
> 
> when nagios starts up, the first time it goes to execute check_tcp (or
> even earlier, when it first reads about the check_commands), it looks
> for such a library via dlopen().  if successful, it fetches the address
> of the check_tcp function via dlsym().  from that point forward, there
> is ZERO overhead, because there's no fork/exec, nor is there any symbol
> resolution, it's just calling a function.  make sense?
> 

Zero overhead is just not going to happen. Nagios MUST be able to 
execute checks in parallell. It can't do that if it just enters a 
function instead without forking, threading or multiplexing (actually it 
can't do that without forking or threading, but popen() forks, so to 
multiplex the results from it would be a sort of mix of both worlds), as 
that would imply a serialized execution.

> this could further be enhanced by adding multi-threading capabilities
> to such a scheme (you could have a seperate thread for each
> check_command, or perhaps some other scheme).  but what's best is
> that it would involve minimal changes to the pre-existing plugins,
> and wouldn't require any significant re-designing of the nagios
> architecture.
> 

It would require a huge re-design of current arch. It would also require 
a huge re-design of most plugins, since they don't clean up after 
themselves as it is today. They also use very shoddy function-calls. Not 
to mention; plugins that crash would cause nagios to crash. This just 
isn't good enough.

> 
>>This is moot. All operating systems worth their salt caches frequently 
>>accessed programs so the code is already in memory anyway.
> 
> 
> systems cache frequently accessed pages in memory, but there's still
> unavoidable overhead in creating a new process, as well as the
> context switching between the various processes.
> 

This would still be unavoidable, so point is still moot (see above on 
parallellism).

> 
>>They would also have to add some code that splits arguments the way they 
>>are supposed to, including some other additional stuff.
> 
> 
> isn't that already done?  hmm... looking in the nagios code, it looks like
> all the plugins are called with popen[1].  so that means *two* fork/execs
> (one for /bin/sh, one for the command /bin/sh executes).  
> 

Three fork()'s and two execve()'s, as nagios itself forks once prior to 
running popen(). execve() replaces the running process, so there's no 
context-switching. It would be possible to get rid of one of the 
fork()'s, but not the other two (see above on parallellism). The popen 
must be there, or nagios would have to fork() explicitly and then run 
the dlopen()'ed code.

> anyway, this wouldn't be very hard to do, just split the arguments on
> whitespace and call check_tcp() with what ought to have been passed to
> exec.
> 

Arguments can contain whitespace if escaped or enclosed in strings. Do 
you feel like writing a function that does that and that's fast enough 
to run as often as is required, while still being rock-solid safe? The 
functions that does this in glibc and bash are asm-enhanced and 
finetuned per architecture they're run at. You'd increase load 
drastically, not reduce it.

A way around this would be to rewrite the plugins more or less from 
scratch, and possibly make them simpler as well, while tagging them for 
nagios to KNOW which ones are expected to have modules installed. For 
instance, the check_command could look something like
:PING 5 40%,100.0 60%,500.0
Having the identifier (after the : be 32 bits has some very obvious 
performance benefits). Come to think of it, arguments could be separated 
with ; instead of whitespace. That leaves only the exception of one 
escaped char, which is a good thing.

> 
> 	sean
> 
> [1] any reason it's being done this way and not with fork/exec/dup?

popen() is fork() + dup() + execve(), more or less. Read glibc-2.3.5 
libio/iopopen.c, especially the _IO_new_proc_open function (popen, but 
glibc internal).

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20




More information about the Developers mailing list