Nagios-devel digest, Vol 1 #807 - 8 msgs

sean finney seanius at debian.org
Mon May 9 20:34:43 CEST 2005


hey,

On Mon, May 09, 2005 at 08:47:42AM -0700, nagios-devel-request at lists.sourceforge.net wrote:
> From: Andreas Ericsson <ae at op5.se>

> Zero overhead is just not going to happen. Nagios MUST be able to 
> execute checks in parallell. It can't do that if it just enters a 
> function instead without forking, threading or multiplexing (actually it 
> can't do that without forking or threading, but popen() forks, so to 
> multiplex the results from it would be a sort of mix of both worlds), as 
> that would imply a serialized execution.

you have a point that there's going to need to be some kind of fork
or multi-threading capabilities.  but calling a function in a forked
process or thread would still be much better performance-wise than the
multiple fork and exec calls in the current implementation.  


> It would require a huge re-design of current arch. It would also require 
> a huge re-design of most plugins, since they don't clean up after 
> themselves as it is today. They also use very shoddy function-calls. Not 

that wouldn't be as much of a "redesign" as it would be a code-cleanup,
which is never a bad thing to do anyway.  plus, what i'm suggesting
isn't an all-or-nothing switchover, but a conditional switch.  plugins
could be audited for poor memory management etc and as they are approved
added to a list of plugins to be added to the shared object target list.

> to mention; plugins that crash would cause nagios to crash. This just 
> isn't good enough.

even forked children?

> > systems cache frequently accessed pages in memory, but there's still
> > unavoidable overhead in creating a new process, as well as the
> > context switching between the various processes.
> 
> This would still be unavoidable, so point is still moot (see above on 
> parallellism).

well, somewhat moot.  see below:

> Three fork()'s and two execve()'s, as nagios itself forks once prior to 
> running popen(). execve() replaces the running process, so there's no 

that's the count that i got:

- nagios forks
- nagios child calls popen
- popen forks 
- popen child calls execve(/bin/sh)
- /bin/sh forks
- /bin/sh child calls execve(cmd)
- /bin/sh child (now cmd) exits with status

and i'm suggesting

- nagios forks
- nagios child calls plugin_function
- nagios child exits return status of plugin_function

note that if this were in a multi-threaded arch, or if the child
processes were pre-allocated, even this fork would have a negligable
effect.

> running popen(). execve() replaces the running process, so there's no 
> context-switching. It would be possible to get rid of one of the 

assuming that one fork() isn't avoidable, you still three processes
between which you have to switch in the popen approach (nagios child,
popen child, /bin/sh child).

> Arguments can contain whitespace if escaped or enclosed in strings. Do 
> you feel like writing a function that does that and that's fast enough 
> to run as often as is required, while still being rock-solid safe? The 
> functions that does this in glibc and bash are asm-enhanced and 
> finetuned per architecture they're run at. You'd increase load 
> drastically, not reduce it.

okay, so a little trickier than splitting on whitespace.  however, i
don't see where your concerns about speed/efficiency are coming from.
why would we need to do this every time the command is executed?  why
not parse the cmd into arguments when the command is first read in from
the conffile?  plus, if we did that regardless of this dlopen suggestion,
we could also cut out the popen call and just do fork/exec/dup on the
actual command using the same argument list.

> A way around this would be to rewrite the plugins more or less from 
> scratch, and possibly make them simpler as well, while tagging them for 
> nagios to KNOW which ones are expected to have modules installed. For 
> instance, the check_command could look something like

i don't see what this gets anyone, apart from more work to accomplish
effectively the same task.

> popen() is fork() + dup() + execve(), more or less. Read glibc-2.3.5 

popen is fork + dup + exec (/bin/sh -c) + fork + dup + exec (your command).

(and later, in another mail)

> Sean, it'd be interesting to compare this test with the dlopen() idea of 
> yours. Make it time itself so that timing starts after dlopen() and each 
> command just requires a table lookup, symbol lookup, fork, execution and 
> collection.

this wouldn't be the best example of a test, because executing /bin/ls
is doing a fork/exec, which is exactly what the idea is trying to avoid.
granted, many plugins do this internally too.  however, i would be
interested to see how the dlopen approach handled itself within a
multithreaded environment.  

i'll see about grabbing a smaller plugin (such as the check_rand you
mention) and testing that.  i'll also try and make it more realistic
to what goes on inside nagios (calling the function from a child).

(and yet later)

> You really need to use a real plugin while doing this test since it's 
> quite obvious that calling a function that does nothing is a lot faster 
> as an in-core function than as an external program. My experiment showed 
> benchmarks between two different ways of executing external programs, so 
> it's ok for that to use nonsense data that returns quickly, while yours 
> focus on the entire execution cycle from execution-start to 
> execution-end. You really need to do something a bit more real to 
> investigate the time gained for that (hint, the most time is spent in 
> the plugin).

actually, last night i started with check_tcp and saw a similar trend
(though not the 4 orders of magnitude seen here).  the reason i didn't
post that was posting instructions on how to properly build check_tcp
as a shared object was slightly more complicated.

> to fire up new checks as old ones complete either (err, that's what it 
> does, but in a serial manner), while both mthread and mplex can be 
> modified only slightly to do just that and thus scales far better.

i think you and i are barking up two slightly different trees here.

what i've been trying to argue is that checks via functions will prove
to be much better performing than executing plugins via fork/exec
or popen.  sure, a multithreaded architecture will also yield better
results (even more so, but also more work to overhaul), but that's kind of
orthogonal to what i'm getting at.  


	sean

-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20050509/07d0da51/attachment.sig>


More information about the Developers mailing list