Nagios Profiler Changes

Hiren Patel hir3npatel at gmail.com
Tue Jun 16 20:41:11 CEST 2009


Steven D. Morrey wrote:
> Hi Everyone,
> 
> As you know I've been hard at work creating a profiler for nagios that is simple, flexible, extensible, fast and above all accurate.
> 
> My initial design was to create a collection of global timers gt_* and global odometers go_* variables that could then be written out to status.dat one by one.
> This worked ok but became quickly unwieldy for obvious reasons.
> 
> My next design was a linked list of objects containing the timer and the counter, as well as name or event type, this made extensibility a snap, but would have made a significant impact on speed since we would have to walk the list at best, and at worst do a strcmp on every single object every time we wanted to update a stat.  So this idea was discarded for the time being.
> 
> Finally I had a better idea.  Each event type is an integer and even though they aren't necessarily close together they would still be appropriate for an array index even if it's a sparse one.
> So this is the new profiler design.
> 
> We have an object containing
> elapsed time, counter, enabled
> 
> We have an array of these objects indexed by event type
> profiler[event].counter++;
> 
> Then when we write it out to status.dat we have a very simple loop that looks to see if the event type is enabled for profiling and outputs it if it is.
> The output looks like
> PROFILE_COUNTER_EVENT_SERVICE_CHECK=100
> 
> Nagiostats then looks for the word PROFILE, and then for COUNTER or ELAPSED, then adds that to a linked list ala my second design, and outputs via mrtg or the normal nagiostats output.
> 
> The other major difference is what we are using to calculate time.
> In the original design we just used time(), but later we decided we needed more resolution so we went to clock(), finally it was discovered that using clock would introduce a bug every 72 minutes and so now we just use gettimeofday
> In the next version I may include clock() time as well but I thought that this would be sufficient for our needs.
> Let me know what you think and I'll try to get a patch out ASAP.
> 
I would definitely like this feature, I think it could help me when 
diagnosing issues.
so we're going to use tv_usec with gettimeofday? using tv_sec would be 
the same as time() otherwise?

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects




More information about the Developers mailing list