latency problem

Hendrik Bäcker andurin at process-zero.de
Thu Sep 25 13:06:18 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Olivier JAN schrieb:
> Hi list,
>
> I get some latency problems i can't explain. Here's the story.
>
> Nagios 3.0.3 on Ubuntu 8.0.4. Hardware is an intel quad core with
> 10 Go Ram and fast disks. I get 24524 services on 1654 hosts to
> check. Services are mostly active-passive with a check intervall of
> 6 hours. check_intervall for hosts is 0 so Nagios make them only on
> demand. No broker module activated. ocsp and ochp are activated
> because this server is part of a distributed system. Nagios debug
> is activated. Some configuration options i have.
OS[C/H]P in such a big environment might be a bottle neck. The givven
commands are executed after every single atomic check. It might be an
alternative to build an eventbroker that can do the same job a little
bit faster.
There was a first step for that on www.op5.org - but actual I can't
find it. Please have a look at the nagios-devel archive for a post
from Andreas Ericsson past the last 3 or 4 weeks.
>
> service_inter_check_delay_method=s service_interleave_factor=s
> host_inter_check_delay_method=s max_concurrent_checks=0
> max_service_check_spread=240 check_result_reaper_frequency=2
> max_check_result_reaper_time=30
>
What about "use_large_installation_tweaks" ?
Do you have disabled env_macro processing?
> I tried to "play" with those options without success. Latency keeps
>  growing whatever i tried. What is strange is the fact that in the
>  performance screen i see that Metric    Min.    Max.    Average Check
> Execution Time:      0.00 sec    15.01 sec    0.888 sec Check
Latency:    0.00
> sec    10191.24 sec    4924.060 sec Percent State Change:    0.00%   
18.36%
> 0.24%
>
Ouch! 82 Minutes average Latency? This means an object becomes checked
82 minutes later than it should be checked - your monitoring is
disaffected.
> and that in the scheduling queue, i see that at 9:45. SERVER
> PRINT_ERROR    25-09-2008 03:45:32    25-09-2008 09:45:32    Normal ENABLED
> Disable Active Checks Of This Service Re-schedule This Service
> Check
>
> It seems that services are corrrectly scheduled despite the latency
> i see in the performance screen. /usr/local/nagios/bin/nagios -s
> /usr/local/nagios/etc/nagios.cfg tells me that everything is fine
> and have no suggestion for me.
>
>
<snip>
> So i'm a bit lost. Which screen is right ? The performance one that
>  indicates the 4924 sec latency or the scheduling one that tells me
> the checks are made in time. What do you think of that ? How can
> the latency be so high when nagios needs to make only 1 or 2
> checks/seconds ? Is there anything wrong in my setup ?
>
Have a look at "/path/to/nagios/bin/nagiostats | grep Latency" and you
now the ugly truth about the runtime latency.
But indeed with an average service checkinterval (17333 seconds) there
are not so many checks in parallel but the first thing nagios wants
to: check anything configured and after having a actual status
re-schedule checks as the should be.
What about nagios.cfg directives like:

retain_state_information
use_retained_scheduling_info
auto_reschedule_checks
auto_rescheduling_interval
auto_rescheduling_window
(The auto rescheduling windows should performe a better balance of the
scheduling queue)

Why do you have such a big average check interval (~ 4 hours) ?

Regards,
Hendrik

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
 
iD8DBQFI23CqlI0PwfxLQjkRAsLhAJ47AIovBh3BJegfu5wV/M5lYUrxzACdGJ7T
l9ZlJE1Cd7Q2jPFlGknp5R4=
=8vfO
-----END PGP SIGNATURE-----


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list