service latency troubles

Andreas Ericsson ae at op5.se
Wed Oct 15 16:20:43 CEST 2008


Antoine Musso wrote:
> Andreas Ericsson a écrit :
>  > Turn off OCHP and OCSP and then reload Nagios. If that doesn't help,
>> unload NDOUtils and then restart Nagios. If that helps, re-enable the
>> OCSP/OCHP commands again. If it's working then, it was NDOUtils fault.
>> If not, it's the combined load of NDOUtils and the OC?P commands.
>>
>> OC?P commands add a rather extraordinary amount of load to the system
>> irrespective of how simple they are. Usually, you'd be better off
>> replacing them with an extremely simple NEB-module.
> 
> Hello Anderas,
> 
> Thanks for answering :) The OCHP/OCSP stuff did not help but I found out 
> two causes for our high latencies :
> 
> 
> The first one is a long timeout on a synchronous check. When a service 
> is non-OK, the main nagios thread trigger a synchronous host check :
> 
> [1223978320.237542] [016.1] [pid=10613] Service is in a non-OK state!
> [1223978320.237547] [016.1] [pid=10613] Host is currently UP, so we'll 
> recheck its state to make sure...
> [1223978320.237694] [256.1] [pid=10613] Running command check_ping...
> [1223978330.251788] [256.1] [pid=10613] Execution time=10.009 sec
> 
> While this plugin is executing, nagios is just idling and the latency 
> raise up really fast. So I modified our check_host_alive command to 
> timeout after 3 seconds, still have to found optimal parameters.
> 
> 
> 
> The second issue is ndo. We have ndo2db listening on a database server 
> on the same switch. ndomod send everything (data_processing_options 
> parameter set to -1) over a tcp connection.
> 
> I analyzed the callback debugging messages (debug 64, verbosity 2) over 
> a period of 938 seconds :
> 
> AGGREGATED_STATUS_DATA (#25)   124 calls, avg: 0.00s (total 0.00s)
>               LOG_DATA (# 9)   366 calls, avg: 0.00s (total 0.56s)
>    SERVICE_STATUS_DATA (#20)  8321 calls, avg: 0.02s (total 149.48s)
>        HOST_CHECK_DATA (#14)  4933 calls, avg: 0.01s (total 50.23s)
>       TIMED_EVENT_DATA (# 8) 12282 calls, avg: 0.00s (total 26.26s)
>    PROGRAM_STATUS_DATA (#18)   177 calls, avg: 0.00s (total 0.76s)
>      STATE_CHANGE_DATA (#30)   195 calls, avg: 0.00s (total 0.45s)
>     SERVICE_CHECK_DATA (#13) 12189 calls, avg: 0.01s (total 82.74s)
>    SYSTEM_COMMAND_DATA (#10)  8584 calls, avg: 0.01s (total 102.84s)
>       HOST_STATUS_DATA (#19)  2544 calls, avg: 0.03s (total 64.67s)
> 
> 49715 calls, 477 seconds 50,85% of time spent on sending ndo messages.
> 
> The impact on latency is really bad, one of my colleague filtered out 
> some of those callbacks (data_processing_options set to 276673) that 
> seems to help :)
> 
> 
> This raise two new questions:
> 
>    1/ is there any recommended setting for checking the liveness of a 
> host ? Since this check is synchronous, we want Nagios to achieve this 
> as fast as possible.
> 

Upgrade to Nagios 3. It sports parallell hostchecks.

>    2/ is ndomod waiting for ndo2db to insert the data in the database ? 


That I don't know, but Nagios waits for the NEB to finish its call before
proceeding, so if ndomod runs some uninterruptable IO, you might be in for
a long wait. Nagios has to wait for the NEB to finish though, so the only
solution is to make sure the NEB returns control to Nagios asap.

> If not, I am going to check why it takes so long to send a packet to the 
> remote ndo2db instance.
> 

That's a good idea. Please let us know what you find.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list