latency problem

Olivier JAN ojan at gfi.fr
Thu Sep 25 14:27:39 CEST 2008


Thanks Hendrik for response.

The large installation tweaks are on. The macros are disabled.  
auto_reschedule_checks is off. Retention state and scheduling are on.

The average check_interval is about 4 hours because services are both  
active-passive. They were before only passive but freshness checking  
was a problem because they are not balanced like active ones. So i use  
now active checks for freshness checking of services that are mostly  
passive. (they can receive events at any time). Not to confusing i  
hope ;-)

But what's really strange is the average latency value shown in  
performance screen. In the debug log, latency given for each service  
checked indicates only one to ten seconds. And like i told in first  
message, it looks that they are checked in time according to the  
scheduling queue screen.

I'm going to test things you mentioned in your reply.


Olivier Jan


Hendrik Bäcker <andurin at process-zero.de> a écrit :

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Olivier JAN schrieb:
>> Hi list,
>>
>> I get some latency problems i can't explain. Here's the story.
>>
>> Nagios 3.0.3 on Ubuntu 8.0.4. Hardware is an intel quad core with
>> 10 Go Ram and fast disks. I get 24524 services on 1654 hosts to
>> check. Services are mostly active-passive with a check intervall of
>> 6 hours. check_intervall for hosts is 0 so Nagios make them only on
>> demand. No broker module activated. ocsp and ochp are activated
>> because this server is part of a distributed system. Nagios debug
>> is activated. Some configuration options i have.
> OS[C/H]P in such a big environment might be a bottle neck. The givven
> commands are executed after every single atomic check. It might be an
> alternative to build an eventbroker that can do the same job a little
> bit faster.
> There was a first step for that on www.op5.org - but actual I can't
> find it. Please have a look at the nagios-devel archive for a post
> from Andreas Ericsson past the last 3 or 4 weeks.
>>
>> service_inter_check_delay_method=s service_interleave_factor=s
>> host_inter_check_delay_method=s max_concurrent_checks=0
>> max_service_check_spread=240 check_result_reaper_frequency=2
>> max_check_result_reaper_time=30
>>
> What about "use_large_installation_tweaks" ?
> Do you have disabled env_macro processing?
>> I tried to "play" with those options without success. Latency keeps
>>  growing whatever i tried. What is strange is the fact that in the
>>  performance screen i see that Metric    Min.    Max.    Average Check
>> Execution Time:      0.00 sec    15.01 sec    0.888 sec Check
> Latency:    0.00
>> sec    10191.24 sec    4924.060 sec Percent State Change:    0.00%
> 18.36%
>> 0.24%
>>
> Ouch! 82 Minutes average Latency? This means an object becomes checked
> 82 minutes later than it should be checked - your monitoring is
> disaffected.
>> and that in the scheduling queue, i see that at 9:45. SERVER
>> PRINT_ERROR    25-09-2008 03:45:32    25-09-2008 09:45:32    Normal ENABLED
>> Disable Active Checks Of This Service Re-schedule This Service
>> Check
>>
>> It seems that services are corrrectly scheduled despite the latency
>> i see in the performance screen. /usr/local/nagios/bin/nagios -s
>> /usr/local/nagios/etc/nagios.cfg tells me that everything is fine
>> and have no suggestion for me.
>>
>>
> <snip>
>> So i'm a bit lost. Which screen is right ? The performance one that
>>  indicates the 4924 sec latency or the scheduling one that tells me
>> the checks are made in time. What do you think of that ? How can
>> the latency be so high when nagios needs to make only 1 or 2
>> checks/seconds ? Is there anything wrong in my setup ?
>>
> Have a look at "/path/to/nagios/bin/nagiostats | grep Latency" and you
> now the ugly truth about the runtime latency.
> But indeed with an average service checkinterval (17333 seconds) there
> are not so many checks in parallel but the first thing nagios wants
> to: check anything configured and after having a actual status
> re-schedule checks as the should be.
> What about nagios.cfg directives like:
>
> retain_state_information
> use_retained_scheduling_info
> auto_reschedule_checks
> auto_rescheduling_interval
> auto_rescheduling_window
> (The auto rescheduling windows should performe a better balance of the
> scheduling queue)
>
> Why do you have such a big average check interval (~ 4 hours) ?
>
> Regards,
> Hendrik
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (MingW32)
>
> iD8DBQFI23CqlI0PwfxLQjkRAsLhAJ47AIovBh3BJegfu5wV/M5lYUrxzACdGJ7T
> l9ZlJE1Cd7Q2jPFlGknp5R4=
> =8vfO
> -----END PGP SIGNATURE-----
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when   
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>






-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list