Nagios-devel digest, Vol 1 #807 - 8 msgs

Andreas Ericsson ae at op5.se
Wed May 11 10:10:01 CEST 2005


sean finney wrote:
> hey andreas,
> 
> On Tue, May 10, 2005 at 10:29:16AM +0200, Andreas Ericsson wrote:
> 
>>Nopes, it won't. The forked children just hang around and wait for the 
>>popen call to finish (pclose and fgets both block), which means they 
>>don't claim a CPU and thus aren't eligible for activation in the kernel 
>>scheduling loop.
> 
> 
> it's true that when the process is in a select/poll/wait or other sleeping
> state it won't take up time, but there's still the initial startup cost
> of fork/exec'ing the process in the first place.  
> 

Which is small and mostly unavoidable as at least one such procedure is 
required for parallellism. As the rest of the forks are in the child 
they impact scheduling and parallellism very little.

> 
> anyway, here's the goods:
> 
> http://www.seanius.net/tmp/dltest.tgz
> 
> given my recent hijinks, a quick sanity check over my code is probably
> in order (in fact i think there is some kind of race in there, maybe).
> after that, "make test" will get you the numbers.  
> 
> i found that the overall "real" time was not significantly impacted,
> but there does seem to be a modest improvement in user/sys time--and
> even more so if you run the test on a slower/busier machine.
> 

Just as I told you. Modest improvements per execution aren't important, 
and in particular not if it doesn't relate to wallclock time. As for the 
busier machine thingie, I ran it with a little program I wrote to eat 
cpu. The busy-loop goes like this;

while(1) {
	i++;
	x = i % ~x;
}

which hogs as much free cpu as possible. i and x are registers to make 
contextswitches really expensive. With this one running, I got the 
following results;

---%<---%<---%<---
running tests....
dltest/popen with 10 kids:
real    0m11.127s
user    0m0.000s
sys     0m0.000s

dltest/dlopen with 10 kids:
real    0m9.075s
user    0m0.000s
sys     0m0.000s

dltest/popen with 100 kids:
real    0m12.445s
user    0m0.000s
sys     0m0.010s

dltest/dlopen with 100 kids:
real    0m12.605s
user    0m0.000s
sys     0m0.000s

dltest/popen with 1000 kids:
real    0m12.895s
user    0m0.000s
sys     0m0.010s

dltest/dlopen with 1000 kids:
real    0m31.265s
user    0m0.000s
sys     0m0.090s

---%<--%<--%<---

This is consistent over several runs, with only minor changes to the 
timings. As you can see, the gain of not forking is gone at 100 
instances and for 1000 kids the dlopen approach offers clearly abysmal 
performance while the popen approach stays roughly the same.

You would do better to abandon this and instead focus on enhancing 
multithreading/multiplexing support or fixing the plugins. There's lots 
of work to do there to improve nagios' performance.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click




More information about the Developers mailing list