Run away service check latency in 2.12

Steven D. Morrey smorrey at ldschurch.org
Fri Apr 3 22:27:15 CEST 2009


Ok folks so I've finally found how latency is calculated in Nagios 2.12

It looks like in events.c around about line 1002 we have these lines.

gettimeofday(&tv,NULL);
temp_service->latency=(double)((double)(tv.tv_sec - event_list_low->run_time)+(double)(tv.tv_usec/1000)/1000.0);

As you can see latency is literally the difference between now and when the check should have run i.e. event_list_low->run_time.
This all seems great until you look a little further down and see that there are at least 5 conditions that would prevent the check from being run, so even though it's latency is updated, it's run time does not get updated. 
This means that as time goes on and those checks get older latency will continue to increase ad infinitum.
This isn't an issue unless average service check latency is an important stat for you which it is around here.
Basically if we have 100 services all OK with 0 latency and then we have one service that doesn't play nice and ends up with 10000 latency, well now we have an average latency of 1000.

The best solution of course is to simply remove the offending service check.
However if like me, you're in a situation where that cannot be done,  I have come up with 2 other possibilities.

Either move the latency update down to where the check ACTUALLY executes, or have it always reschedule checks even if they have failed by moving the rescheduling code out of the if(run_event==TRUE) block.

I would like to get some feedback on this, since it has seriously been throwing off my stats.

Thanks in advance!

Sincerely,
Steven Morrey
 


 NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.



------------------------------------------------------------------------------




More information about the Developers mailing list