try lowering max_check_result_reaper value.... I had good luck playing with that value. Thanks <div class="gmail_quote">On Tue, May 4, 2010 at 8:13 PM, Trisha Hoang <<a href="mailto:trisha@rockyou.com">trisha@rockyou.com</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi, The nagios master got really high host latency and I'm not sure how to tweak it. I ran the check_ping plugin on a handful of hosts and the rta averaged at 0.2 second so it's not the network. Environment: - 565 hosts - 6790 passive checks from the slaves - not using event broker - master server actively executes the hosts checks every 5 minutes and passively processes checks every 1 minute - not doing performance data Nagiostats Nagios Stats 3.2.1 Copyright (c) 2003-2008 Ethan Galstad (<a href="http://www.nagios.org" target="_blank">www.nagios.org</a>) Last Modified: 03-09-2010 License: GPL CURRENT STATUS DATA ------------------------------------------------------ Status File: /var/log/nagios/status.dat Status File Age: 0d 0h 0m 23s Status File Version: 3.2.1 Program Running Time: 0d 1h 32m 19s Nagios PID: 28282 Used/High/Total Command Buffers: 1316 / 3066 / 4096 Total Services: 7745 Services Checked: 7745 Services Scheduled: 1381 Services Actively Checked: 955 Services Passively Checked: 6790 Total Service State Change: 0.000 / 9.740 / 0.007 % Active Service Latency: 18.948 / 205.144 / 165.751 sec Active Service Execution Time: 0.007 / 9.051 / 0.055 sec Active Service State Change: 0.000 / 5.460 / 0.006 % Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Passive Service Latency: 34.359 / 190.247 / 76.739 sec Passive Service State Change: 0.000 / 9.740 / 0.008 % Passive Services Last 1/5/15/60 min: 0 / 3054 / 6774 / 6784 Services Ok/Warn/Unk/Crit: 7720 / 1 / 0 / 24 Services Flapping: 27 Services In Downtime: 0 Total Hosts: 566 Hosts Checked: 566 Hosts Scheduled: 566 Hosts Actively Checked: 566 Host Passively Checked: 0 Total Host State Change: 0.000 / 0.000 / 0.000 % Active Host Latency: 0.000 / 3410.087 / 2413.051 sec Active Host Execution Time: 0.007 / 10.010 / 0.063 sec Active Host State Change: 0.000 / 0.000 / 0.000 % Active Hosts Last 1/5/15/60 min: 0 / 8 / 10 / 565 Passive Host Latency: 0.000 / 0.000 / 0.000 sec Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 563 / 3 / 0 Hosts Flapping: 1 Hosts In Downtime: 0 Active Host Checks Last 1/5/15 min: 5 / 32 / 75 Scheduled: 0 / 0 / 0 On-demand: 5 / 32 / 75 Parallel: 1 / 11 / 23 Serial: 0 / 0 / 0 Cached: 4 / 21 / 52 Passive Host Checks Last 1/5/15 min: 0 / 0 / 0 Active Service Checks Last 1/5/15 min: 0 / 0 / 0 Scheduled: 0 / 0 / 0 On-demand: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455 External Commands Last 1/5/15 min: 1302 / 6063 / 20253 Nagios.cfg # EXTERNAL COMMAND CHECK INTERVAL # This is the interval at which Nagios should check for external commands. # This value works of the interval_length you specify later. If you leave # that at its default value of 60 (seconds), a value of 1 here will cause # Nagios to check for external commands every minute. If you specify a # number followed by an "s" (i.e. 15s), this will be interpreted to mean # actual seconds rather than a multiple of the interval_length variable. # Note: In addition to reading the external command file at regularly # scheduled intervals, Nagios will also check for external commands after # event handlers are executed. # NOTE: Setting this value to -1 causes Nagios to check the external # command file as often as possible. #command_check_interval=15s command_check_interval=-1 # SERVICE INTER-CHECK DELAY METHOD # This is the method that Nagios should use when initially # "spreading out" service checks when it starts monitoring. The # default is to use smart delay calculation, which will try to # space all service checks out evenly to minimize CPU load. # Using the dumb setting will cause all checks to be scheduled # at the same time (with no delay between them)! This is not a # good thing for production, but is useful when testing the # parallelization functionality. # n = None - don't use any delay between checks # d = Use a "dumb" delay of 1 second between checks # s = Use "smart" inter-check delay calculation # x.xx = Use an inter-check delay of x.xx seconds service_inter_check_delay_method=s # MAXIMUM SERVICE CHECK SPREAD # This variable determines the timeframe (in minutes) from the # program start time that an initial check of all services should # be completed. Default is 30 minutes. max_service_check_spread=30 # SERVICE CHECK INTERLEAVE FACTOR # This variable determines how service checks are interleaved. # Interleaving the service checks allows for a more even # distribution of service checks and reduced load on remote # hosts. Setting this value to 1 is equivalent to how versions # of Nagios previous to 0.0.5 did service checks. Set this # value to s (smart) for automatic calculation of the interleave # factor unless you have a specific reason to change it. # s = Use "smart" interleave factor calculation # x = Use an interleave factor of x, where x is a # number greater than or equal to 1. service_interleave_factor=s # HOST INTER-CHECK DELAY METHOD # This is the method that Nagios should use when initially # "spreading out" host checks when it starts monitoring. The # default is to use smart delay calculation, which will try to # space all host checks out evenly to minimize CPU load. # Using the dumb setting will cause all checks to be scheduled # at the same time (with no delay between them)! # n = None - don't use any delay between checks # d = Use a "dumb" delay of 1 second between checks # s = Use "smart" inter-check delay calculation # x.xx = Use an inter-check delay of x.xx seconds host_inter_check_delay_method=s # MAXIMUM HOST CHECK SPREAD # This variable determines the timeframe (in minutes) from the # program start time that an initial check of all hosts should # be completed. Default is 30 minutes. max_host_check_spread=30 # MAXIMUM CONCURRENT SERVICE CHECKS # This option allows you to specify the maximum number of # service checks that can be run in parallel at any given time. # Specifying a value of 1 for this variable essentially prevents # any service checks from being parallelized. A value of 0 # will not restrict the number of concurrent checks that are # being executed. max_concurrent_checks=0 # HOST AND SERVICE CHECK REAPER FREQUENCY # This is the frequency (in seconds!) that Nagios will process # the results of host and service checks. check_result_reaper_frequency=10 # MAX CHECK RESULT REAPER TIME # This is the max amount of time (in seconds) that a single # check result reaper event will be allowed to run before # returning control back to Nagios so it can perform other # duties. max_check_result_reaper_time=30 # CHECK RESULT PATH # This is directory where Nagios stores the results of host and # service checks that have not yet been processed. # # Note: Make sure that only one instance of Nagios has access # to this directory! check_result_path=/var/log/nagios/spool/checkresults # MAX CHECK RESULT FILE AGE # This option determines the maximum age (in seconds) which check # result files are considered to be valid. Files older than this # threshold will be mercilessly deleted without further processing. max_check_result_file_age=3600 # CACHED HOST CHECK HORIZON # This option determines the maximum amount of time (in seconds) # that the state of a previous host check is considered current. # Cached host states (from host checks that were performed more # recently that the timeframe specified by this value) can immensely # improve performance in regards to the host check logic. # Too high of a value for this option may result in inaccurate host # states being used by Nagios, while a lower value may result in a # performance hit for host checks. Use a value of 0 to disable host # check caching. #cached_host_check_horizon=15 cached_host_check_horizon=60 # CACHED SERVICE CHECK HORIZON # This option determines the maximum amount of time (in seconds) # that the state of a previous service check is considered current. # Cached service states (from service checks that were performed more # recently that the timeframe specified by this value) can immensely # improve performance in regards to predictive dependency checks. # Use a value of 0 to disable service check caching. cached_service_check_horizon=15 # ENABLE PREDICTIVE HOST DEPENDENCY CHECKS # This option determines whether or not Nagios will attempt to execute # checks of hosts when it predicts that future dependency logic test # may be needed. These predictive checks can help ensure that your # host dependency logic works well. # Values: # 0 = Disable predictive checks # 1 = Enable predictive checks (default) enable_predictive_host_dependency_checks=1 # ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS # This option determines whether or not Nagios will attempt to execute # checks of service when it predicts that future dependency logic test # may be needed. These predictive checks can help ensure that your # service dependency logic works well. # Values: # 0 = Disable predictive checks # 1 = Enable predictive checks (default) enable_predictive_service_dependency_checks=1 # AUTO-RESCHEDULING OPTION # This option determines whether or not Nagios will attempt to # automatically reschedule active host and service checks to # "smooth" them out over time. This can help balance the load on # the monitoring server. # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY auto_reschedule_checks=0 # AUTO-RESCHEDULING INTERVAL # This option determines how often (in seconds) Nagios will # attempt to automatically reschedule checks. This option only # has an effect if the auto_reschedule_checks option is enabled. # Default is 30 seconds. # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY auto_rescheduling_interval=30 # AUTO-RESCHEDULING WINDOW # This option determines the "window" of time (in seconds) that # Nagios will look at when automatically rescheduling checks. # Only host and service checks that occur in the next X seconds # (determined by this variable) will be rescheduled. This option # only has an effect if the auto_reschedule_checks option is # enabled. Default is 180 seconds (3 minutes). # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY auto_rescheduling_window=180 # SLEEP TIME # This is the number of seconds to sleep between checking for system # events and service checks that need to be run. sleep_time=0.25 # TIMEOUT VALUES # These options control how much time Nagios will allow various # types of commands to execute before killing them off. Options # are available for controlling maximum time allotted for # service checks, host checks, event handlers, notifications, the # ocsp command, and performance data commands. All values are in # seconds. service_check_timeout=60 host_check_timeout=30 event_handler_timeout=30 notification_timeout=30 ocsp_timeout=5 perfdata_timeout=5 # AGGRESSIVE HOST CHECKING OPTION # If you don't want to turn on aggressive host checking features, set # this value to 0 (the default). Otherwise set this value to 1 to # enable the aggressive check option. Read the docs for more info # on what aggressive host check is or check out the source code in # base/checks.c use_aggressive_host_checking=0 # SERVICE CHECK EXECUTION OPTION # This determines whether or not Nagios will actively execute # service checks when it initially starts. If this option is # disabled, checks are not actively made, but Nagios can still # receive and process passive check results that come in. Unless # you're implementing redundant hosts or have a special need for # disabling the execution of service checks, leave this enabled! # Values: 1 = enable checks, 0 = disable checks execute_service_checks=0 # PASSIVE SERVICE CHECK ACCEPTANCE OPTION # This determines whether or not Nagios will accept passive # service checks results when it initially (re)starts. # Values: 1 = accept passive checks, 0 = reject passive checks accept_passive_service_checks=1 # HOST CHECK EXECUTION OPTION # This determines whether or not Nagios will actively execute # host checks when it initially starts. If this option is # disabled, checks are not actively made, but Nagios can still # receive and process passive check results that come in. Unless # you're implementing redundant hosts or have a special need for # disabling the execution of host checks, leave this enabled! # Values: 1 = enable checks, 0 = disable checks execute_host_checks=1 # PASSIVE HOST CHECK ACCEPTANCE OPTION # This determines whether or not Nagios will accept passive # host checks results when it initially (re)starts. # Values: 1 = accept passive checks, 0 = reject passive checks accept_passive_host_checks=0 # OBSESS OVER SERVICE CHECKS OPTION # This determines whether or not Nagios will obsess over service # checks and run the ocsp_command defined below. Unless you're # planning on implementing distributed monitoring, do not enable # this option. Read the HTML docs for more information on # implementing distributed monitoring. # Values: 1 = obsess over services, 0 = do not obsess (default) obsess_over_services=0 # OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND # This is the command that is run for every service check that is # processed by Nagios. This command is executed only if the # obsess_over_services option (above) is set to 1. The command # argument is the short name of a command definition that you # define in your host configuration file. Read the HTML docs for # more information on implementing distributed monitoring. #ocsp_command=somecommand # OBSESS OVER HOST CHECKS OPTION # This determines whether or not Nagios will obsess over host # checks and run the ochp_command defined below. Unless you're # planning on implementing distributed monitoring, do not enable # this option. Read the HTML docs for more information on # implementing distributed monitoring. # Values: 1 = obsess over hosts, 0 = do not obsess (default) obsess_over_hosts=0 # OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND # This is the command that is run for every host check that is # processed by Nagios. This command is executed only if the # obsess_over_hosts option (above) is set to 1. The command # argument is the short name of a command definition that you # define in your host configuration file. Read the HTML docs for # more information on implementing distributed monitoring. #ochp_command=somecommand # SERVICE FRESHNESS CHECK OPTION # This option determines whether or not Nagios will periodically # check the "freshness" of service results. Enabling this option # is useful for ensuring passive checks are received in a timely # manner. # Values: 1 = enabled freshness checking, 0 = disable freshness checking check_service_freshness=1 # SERVICE FRESHNESS CHECK INTERVAL # This setting determines how often (in seconds) Nagios will # check the "freshness" of service check results. If you have # disabled service freshness checking, this option has no effect. #service_freshness_check_interval=60 service_freshness_check_interval=420 # HOST FRESHNESS CHECK OPTION # This option determines whether or not Nagios will periodically # check the "freshness" of host results. Enabling this option # is useful for ensuring passive checks are received in a timely # manner. # Values: 1 = enabled freshness checking, 0 = disable freshness checking check_host_freshness=0 #check_host_freshness=1 # HOST FRESHNESS CHECK INTERVAL # This setting determines how often (in seconds) Nagios will # check the "freshness" of host check results. If you have # disabled host freshness checking, this option has no effect. #host_freshness_check_interval=60 host_freshness_check_interval=420 # ADDITIONAL FRESHNESS THRESHOLD LATENCY # This setting determines the number of seconds that Nagios # will add to any host and service freshness thresholds that # it calculates (those not explicitly specified by the user). #additional_freshness_latency=15 additional_freshness_latency=180 # LARGE INSTALLATION TWEAKS OPTION # This option determines whether or not Nagios will take some shortcuts # which can save on memory and CPU usage in large Nagios installations. # Read the documentation for more information on the benefits/tradeoffs # of enabling this option. # Values: 1 - Enabled tweaks # 0 - Disable tweaks (default) use_large_installation_tweaks=1 # CHILD PROCESS MEMORY OPTION # This option determines whether or not Nagios will free memory in # child processes (processed used to execute system commands and host/ # service checks). If you specify a value here, it will override # program defaults. # Value: 1 - Free memory in child processes # 0 - Do not free memory in child processes #free_child_process_memory=1 # CHILD PROCESS FORKING BEHAVIOR # This option determines how Nagios will fork child processes # (used to execute system commands and host/service checks). Normally # child processes are fork()ed twice, which provides a very high level # of isolation from problems. Fork()ing once is probably enough and will # save a great deal on CPU usage (in large installs), so you might # want to consider using this. If you specify a value here, it will # program defaults. # Value: 1 - Child processes fork() twice # 0 - Child processes fork() just once #child_processes_fork_twice=1 child_processes_fork_twice=0 # DEBUG LEVEL # This option determines how much (if any) debugging information will # be written to the debug file. OR values together to log multiple # types of information. # Values: # -1 = Everything # 0 = Nothing # 1 = Functions # 2 = Configuration # 4 = Process information # 8 = Scheduled events # 16 = Host/service checks # 32 = Notifications # 64 = Event broker # 128 = External commands # 256 = Commands # 512 = Scheduled downtime # 1024 = Comments # 2048 = Macros debug_level=16 # DEBUG VERBOSITY # This option determines how verbose the debug log out will be. # Values: 0 = Brief output # 1 = More detailed # 2 = Very detailed debug_verbosity=1 Thanks in advance for your help. Trisha ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list <a href="mailto:Nagios-users@lists.sourceforge.net">Nagios-users@lists.sourceforge.net</a> <a href="https://lists.sourceforge.net/lists/listinfo/nagios-users" target="_blank">https://lists.sourceforge.net/lists/listinfo/nagios-users</a> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null </blockquote></div> -- Cordially, Shadhin Rahman