service latency troubles

Andreas Ericsson ae at op5.se
Sun Oct 12 01:10:06 CEST 2008


Antoine Musso wrote:
> Hello,
> 
>  For testing purposes, we are monitoring 1750 services on 580 hosts. We 
> would like to check each service every 300 second, unfortunatly, nagios 
> report a service check latency of 150 to 250 seconds !
> 
> I noticed nagios launch roughly 200 checks and then idle for roughly a 
> minute. I checked that using a lame script :
> 
>   while true; do ps -u nagios | wc -l; sleep 1; done;
> 
> 
> Our inter_check_delay_method and inter_leave_factor_method are set to 
> smart :
> 
>     SERVICE SCHEDULING INFORMATION
>     -------------------------------
>     Total services:                     1750
>     Total scheduled services:           1750
>     Service inter-check delay method:   SMART
>     Average service check interval:     300.00 sec
>                                         ^^^
> This is our aim -----------------------///
> 
>     Inter-check delay:                  0.17 sec
>     Interleave factor method:           SMART
>     Average services per host:          3.01
>     Service interleave factor:          4
>     Max service check spread:           5 min
>     First scheduled check:              Fri Oct 10 17:00:15 2008
>     Last scheduled check:               Fri Oct 10 17:05:15 2008
> 
> 
> The maximum concurrent service checks is set to 200 :
> 
>      CHECK PROCESSING INFORMATION
>      ----------------------------
>      Check result reaper interval:       10 sec
>      Max concurrent service checks:      200
> 
> 
> And here is an overview of nagiostat output (we do not use passive 
> checks nor flapping detection).
> 
>   Total Services:                        1750
>   Services Checked:                      1750
>   Services Scheduled:                    1750
>   Services Actively Checked:             1750
> 
>   Total Service State Change:            0.000 / 55.720 / 0.530 %
>   Active Service Latency:                146.819 / 214.765 / 177.500 sec
>   Active Service Execution Time:         0.083 / 12.984 / 1.074 sec
>   Active Service State Change:           0.000 / 55.720 / 0.530 %
>   Active Services Last 1/5/15/60 min:    92 / 1098 / 1750 / 1750
>   Services Ok/Warn/Unk/Crit:             1671 / 60 / 7 / 12
> 
> Active Service Checks Last 1/5/15 min:  176 / 1159 / 3432
>    Scheduled:                           176 / 1159 / 3432
>    On-demand:                           0 / 0 / 0
>    Cached:                              0 / 0 / 0
> 
> 
> I have not found how to make nagios to launch service checks more often 
> than every minutes. Does anyone have any idea ? :)
> 

As detailed as your report is, it doesn't mention what OS (with version)
you're using, nor does it mention what version of Nagios you're using.

Besides that though, what other "extras" are you using? Typical latency-
raisers are:
* OCHP/OCSP commands
* Older versions of NDOUtils
* Home-written eventbroker modules
* Clumsily written plugins that don't time out in a timely manner and
  don't complete quickly enough.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list