Event Profiler Timers?

Andreas Ericsson ae at op5.se
Wed Jun 10 10:47:24 CEST 2009


Steven D. Morrey wrote:
>>> Would it be better (i.e. worth the overhead), to use clock_t =
>>> clock() to get the entry and exit?
>> Could be. But then we might as well use gettimeofday() from the
>> start so the wrap-around case at least gets exercised a lot instead
>> of every 72 minutes.
> 
> I don't follow you on that, can you elaborate on what you mean by the
> wrap-around case and it being exercised, because I am not aware of it
> and therefore have not taken it into consideration. My primary
> reasons for wanting to use clock instead of gettimeofday were speed
> and precision.
> 

Righto. clock(3) returns a value that represents seconds in 1/1000000
"scale". This means that, for 32-bit architectures, it will wrap around
every 71m 34.96s. Consider this:

start = clock();
do_stuff();
end = clock();

if (end < start) {
	/* wrapped around */
	delta = (end + (1 << (sizeof(end) * CHAR_BIT)) - start;
} else {
	delta = end - start;
}

This might *look* right, but it really isn't. Some compilers can't
store the intermittent value from the first case (which *will* wrap
around once again), so it will get the result wrong, causing delta
to end up as a huge number (clock_t can't be negative) and severely
skewing the performance data. Since this will happen only for some
compilers on some archs, and only every ~72 minutes, it will be
pure hell to track down.

Otoh, it might be worth doing something like this:

clock_t start;
double delta;

start = clock();
do_stuff();
delta = clock_delta(start, clock());

static double clock_delta(clock_t start, clock_t end)
{
	if (end > start) {
		/* normal case */
		delta = (double)(end - start);
	}
	if (end < start) {
		/* wrap-around */
		delta = (double)(end + (1 << (sizeof(end) * CHAR_BIT)));
		delta -= (float)start;
	}

	return delta / CLOCKS_PER_SEC;
}

which should always provide the value as a fraction of seconds,
which is almost certainly how you want to display it anyways.

On a side-note, try to use arrays of doubles to store the total
delta. It's much neater than having multiple global variables.

The clock_delta function is also quite easily testable which
alleviates the worries about the wrap-around case not being tested
much.

> With gettimeofday we are talking about how much calendar time has
> been spent, whereas with clock we are talking about how many ticks
> were used by the program and then dividing that by clocks_per_second
> to get a process time. I believe that clock would be more accurate
> since it filters out the cost of other processes running on the
> system.
> 

True, but the meaning of it would be hard to correlate against latency,
since some events spend an inordinate amount of time switching contexts
while others are fairly cpu-intensive but achieve a lot in wallclock
time. Parsing passive check-results is CPU-intensive, but in wallclock
time we can easily parse 50 or 100 of them in the same time it takes to
fork and fire up a plugin. I'm not sure which of the values would be
preferrable here, and I'm also not sure which of the processes gets the
CPU-time attributed to it.

> Another consideration would be the amount of overhead involved. It
> appears to me that
> 
> elapsed_time = (double) (end - start) / CLOCKS_PER_SECOND; total_time
> += elapsed_time
> 

But this is in-correct once every 72 minutes when end < start, as
explained above.

> Would involve significantly less over head than
> 
> elapsed_time = (end.tv_usec - start.tv_usec) + ((end.tv_sec*1000)
> -(start.tv_sec*1000)); total_time += (double) (elapsed_time / 1000);
> 

This is potentially incorrect for every check, but errors in this
algorithm would quite quickly be spotted, and tv_usec doesn't really
wrap; It just resets, which means we can do all the math with ints
and longs instead of off-loading it to the mcpu. Otoh, Nagios
really makes no use of the mcpu at all, so we could actually gain
performance by *not* using integer math. Some testing would be nice.
1 billion iterations should show some measurable delta, I think.

Note that this will be different per architecture, but I think it's
safe to assume that x86 is by far the most common architecture, so
whatever runs fastest on that one should probably be the one we use.

> Am I wrong or I missing something?

Only the wrap-around, but that sort of makes or breaks the algo when
using clock(3) since the wrap-around makes such an enormous difference
there.

> Thanks for the information by the
> way, it's extremely helpful.
> 

Yw. Thanks for sending patches :-)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects




More information about the Developers mailing list