high host latency on nagios master

shadih rahman shadhin71 at gmail.com
Thu May 6 17:22:40 CEST 2010


try lowering max_check_result_reaper value....  I had good luck playing with
that value.  Thanks

On Tue, May 4, 2010 at 8:13 PM, Trisha Hoang <trisha at rockyou.com> wrote:

> Hi,
> The nagios *master *got really high host latency and I'm not sure how to
> tweak it. I ran the check_ping plugin on a handful of hosts and the rta
> averaged at 0.2 second so it's not the network.
>
> *Environment:*
> - 565 hosts
> - 6790 passive checks from the slaves
> - not using event broker
> - master server *actively* executes the hosts checks every 5 minutes and *passively
> *processes checks every 1 minute
> - not doing performance data
>
> *Nagiostats*
>
> Nagios Stats 3.2.1
> Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
> Last Modified: 03-09-2010
> License: GPL
>
> CURRENT STATUS DATA
> ------------------------------------------------------
> Status File:                            /var/log/nagios/status.dat
> Status File Age:                        0d 0h 0m 23s
> Status File Version:                    3.2.1
>
> Program Running Time:                   0d 1h 32m 19s
> Nagios PID:                             28282
> Used/High/Total Command Buffers:        1316 / 3066 / 4096
>
> Total Services:                         7745
> Services Checked:                       7745
> Services Scheduled:                     1381
> Services Actively Checked:              955
> Services Passively Checked:             6790
> Total Service State Change:             0.000 / 9.740 / 0.007 %
> Active Service Latency:                 18.948 / 205.144 / 165.751 sec
> Active Service Execution Time:          0.007 / 9.051 / 0.055 sec
> Active Service State Change:            0.000 / 5.460 / 0.006 %
> Active Services Last 1/5/15/60 min:     0 / 0 / 0 / 0
> Passive Service Latency:                34.359 / 190.247 / 76.739 sec
> Passive Service State Change:           0.000 / 9.740 / 0.008 %
> Passive Services Last 1/5/15/60 min:    0 / 3054 / 6774 / 6784
> Services Ok/Warn/Unk/Crit:              7720 / 1 / 0 / 24
> Services Flapping:                      27
> Services In Downtime:                   0
>
> Total Hosts:                            566
> Hosts Checked:                          566
> Hosts Scheduled:                        566
> Hosts Actively Checked:                 566
> Host Passively Checked:                 0
> Total Host State Change:                0.000 / 0.000 / 0.000 %
> Active Host Latency:                    0.000 / 3410.087 / 2413.051 sec
> Active Host Execution Time:             0.007 / 10.010 / 0.063 sec
> Active Host State Change:               0.000 / 0.000 / 0.000 %
> Active Hosts Last 1/5/15/60 min:        0 / 8 / 10 / 565
> Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
> Passive Host State Change:              0.000 / 0.000 / 0.000 %
> Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
> Hosts Up/Down/Unreach:                  563 / 3 / 0
> Hosts Flapping:                         1
> Hosts In Downtime:                      0
>
> Active Host Checks Last 1/5/15 min:     5 / 32 / 75
>    Scheduled:                           0 / 0 / 0
>    On-demand:                           5 / 32 / 75
>    Parallel:                            1 / 11 / 23
>    Serial:                              0 / 0 / 0
>    Cached:                              4 / 21 / 52
> Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
> Active Service Checks Last 1/5/15 min:  0 / 0 / 0
>    Scheduled:                           0 / 0 / 0
>    On-demand:                           0 / 0 / 0
>    Cached:                              0 / 0 / 0
> Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455
>
> External Commands Last 1/5/15 min:      1302 / 6063 / 20253
>
>
> *Nagios.cfg*
>
> # EXTERNAL COMMAND CHECK INTERVAL
> # This is the interval at which Nagios should check for external commands.
> # This value works of the interval_length you specify later.  If you leave
> # that at its default value of 60 (seconds), a value of 1 here will cause
> # Nagios to check for external commands every minute.  If you specify a
> # number followed by an "s" (i.e. 15s), this will be interpreted to mean
> # actual seconds rather than a multiple of the interval_length variable.
> # Note: In addition to reading the external command file at regularly
> # scheduled intervals, Nagios will also check for external commands after
> # event handlers are executed.
> # NOTE: Setting this value to -1 causes Nagios to check the external
> # command file as often as possible.
>
> #command_check_interval=15s
> command_check_interval=-1
>
> # SERVICE INTER-CHECK DELAY METHOD
> # This is the method that Nagios should use when initially
> # "spreading out" service checks when it starts monitoring.  The
> # default is to use smart delay calculation, which will try to
> # space all service checks out evenly to minimize CPU load.
> # Using the dumb setting will cause all checks to be scheduled
> # at the same time (with no delay between them)!  This is not a
> # good thing for production, but is useful when testing the
> # parallelization functionality.
> #       n       = None - don't use any delay between checks
> #       d       = Use a "dumb" delay of 1 second between checks
> #       s       = Use "smart" inter-check delay calculation
> #       x.xx    = Use an inter-check delay of x.xx seconds
>
> service_inter_check_delay_method=s
>
> # MAXIMUM SERVICE CHECK SPREAD
> # This variable determines the timeframe (in minutes) from the
> # program start time that an initial check of all services should
> # be completed.  Default is 30 minutes.
>
> max_service_check_spread=30
>
> # SERVICE CHECK INTERLEAVE FACTOR
> # This variable determines how service checks are interleaved.
> # Interleaving the service checks allows for a more even
> # distribution of service checks and reduced load on remote
> # hosts.  Setting this value to 1 is equivalent to how versions
> # of Nagios previous to 0.0.5 did service checks.  Set this
> # value to s (smart) for automatic calculation of the interleave
> # factor unless you have a specific reason to change it.
> #       s       = Use "smart" interleave factor calculation
> #       x       = Use an interleave factor of x, where x is a
> #                 number greater than or equal to 1.
>
> service_interleave_factor=s
>
> # HOST INTER-CHECK DELAY METHOD
> # This is the method that Nagios should use when initially
> # "spreading out" host checks when it starts monitoring.  The
> # default is to use smart delay calculation, which will try to
> # space all host checks out evenly to minimize CPU load.
> # Using the dumb setting will cause all checks to be scheduled
> # at the same time (with no delay between them)!
> #       n       = None - don't use any delay between checks
> #       d       = Use a "dumb" delay of 1 second between checks
> #       s       = Use "smart" inter-check delay calculation
> #       x.xx    = Use an inter-check delay of x.xx seconds
>
> host_inter_check_delay_method=s
>
>
> # MAXIMUM HOST CHECK SPREAD
> # This variable determines the timeframe (in minutes) from the
> # program start time that an initial check of all hosts should
> # be completed.  Default is 30 minutes.
>
> max_host_check_spread=30
>
>
> # MAXIMUM CONCURRENT SERVICE CHECKS
> # This option allows you to specify the maximum number of
> # service checks that can be run in parallel at any given time.
> # Specifying a value of 1 for this variable essentially prevents
> # any service checks from being parallelized.  A value of 0
> # will not restrict the number of concurrent checks that are
> # being executed.
>
> max_concurrent_checks=0
>
>
> # HOST AND SERVICE CHECK REAPER FREQUENCY
> # This is the frequency (in seconds!) that Nagios will process
> # the results of host and service checks.
>
> check_result_reaper_frequency=10
>
> # MAX CHECK RESULT REAPER TIME
> # This is the max amount of time (in seconds) that  a single
> # check result reaper event will be allowed to run before
> # returning control back to Nagios so it can perform other
> # duties.
>
> max_check_result_reaper_time=30
>
>
> # CHECK RESULT PATH
> # This is directory where Nagios stores the results of host and
> # service checks that have not yet been processed.
> #
> # Note: Make sure that only one instance of Nagios has access
> # to this directory!
>
> check_result_path=/var/log/nagios/spool/checkresults
>
>
> # MAX CHECK RESULT FILE AGE
> # This option determines the maximum age (in seconds) which check
> # result files are considered to be valid.  Files older than this
> # threshold will be mercilessly deleted without further processing.
>
> max_check_result_file_age=3600
>
>
> # CACHED HOST CHECK HORIZON
> # This option determines the maximum amount of time (in seconds)
> # that the state of a previous host check is considered current.
> # Cached host states (from host checks that were performed more
> # recently that the timeframe specified by this value) can immensely
> # improve performance in regards to the host check logic.
> # Too high of a value for this option may result in inaccurate host
> # states being used by Nagios, while a lower value may result in a
> # performance hit for host checks.  Use a value of 0 to disable host
> # check caching.
>
> #cached_host_check_horizon=15
> cached_host_check_horizon=60
>
> # CACHED SERVICE CHECK HORIZON
> # This option determines the maximum amount of time (in seconds)
> # that the state of a previous service check is considered current.
> # Cached service states (from service checks that were performed more
> # recently that the timeframe specified by this value) can immensely
> # improve performance in regards to predictive dependency checks.
> # Use a value of 0 to disable service check caching.
>
> cached_service_check_horizon=15
>
>
>
> # ENABLE PREDICTIVE HOST DEPENDENCY CHECKS
> # This option determines whether or not Nagios will attempt to execute
> # checks of hosts when it predicts that future dependency logic test
> # may be needed.  These predictive checks can help ensure that your
> # host dependency logic works well.
> # Values:
> #  0 = Disable predictive checks
> #  1 = Enable predictive checks (default)
>
> enable_predictive_host_dependency_checks=1
>
>
>
> # ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS
> # This option determines whether or not Nagios will attempt to execute
> # checks of service when it predicts that future dependency logic test
> # may be needed.  These predictive checks can help ensure that your
> # service dependency logic works well.
> # Values:
> #  0 = Disable predictive checks
> #  1 = Enable predictive checks (default)
>
> enable_predictive_service_dependency_checks=1
>
> # AUTO-RESCHEDULING OPTION
> # This option determines whether or not Nagios will attempt to
> # automatically reschedule active host and service checks to
> # "smooth" them out over time.  This can help balance the load on
> # the monitoring server.
> # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
> # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
>
> auto_reschedule_checks=0
>
>
>
> # AUTO-RESCHEDULING INTERVAL
> # This option determines how often (in seconds) Nagios will
> # attempt to automatically reschedule checks.  This option only
> # has an effect if the auto_reschedule_checks option is enabled.
> # Default is 30 seconds.
> # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
> # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
>
> auto_rescheduling_interval=30
>
>
>
> # AUTO-RESCHEDULING WINDOW
> # This option determines the "window" of time (in seconds) that
> # Nagios will look at when automatically rescheduling checks.
> # Only host and service checks that occur in the next X seconds
> # (determined by this variable) will be rescheduled. This option
> # only has an effect if the auto_reschedule_checks option is
> # enabled.  Default is 180 seconds (3 minutes).
> # WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
> # PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
>
> auto_rescheduling_window=180
>
>
>
> # SLEEP TIME
> # This is the number of seconds to sleep between checking for system
> # events and service checks that need to be run.
>
> sleep_time=0.25
>
> # TIMEOUT VALUES
> # These options control how much time Nagios will allow various
> # types of commands to execute before killing them off.  Options
> # are available for controlling maximum time allotted for
> # service checks, host checks, event handlers, notifications, the
> # ocsp command, and performance data commands.  All values are in
> # seconds.
>
> service_check_timeout=60
> host_check_timeout=30
> event_handler_timeout=30
> notification_timeout=30
> ocsp_timeout=5
> perfdata_timeout=5
>
> # AGGRESSIVE HOST CHECKING OPTION
> # If you don't want to turn on aggressive host checking features, set
> # this value to 0 (the default).  Otherwise set this value to 1 to
> # enable the aggressive check option.  Read the docs for more info
> # on what aggressive host check is or check out the source code in
> # base/checks.c
>
> use_aggressive_host_checking=0
>
>
>
> # SERVICE CHECK EXECUTION OPTION
> # This determines whether or not Nagios will actively execute
> # service checks when it initially starts.  If this option is
> # disabled, checks are not actively made, but Nagios can still
> # receive and process passive check results that come in.  Unless
> # you're implementing redundant hosts or have a special need for
> # disabling the execution of service checks, leave this enabled!
> # Values: 1 = enable checks, 0 = disable checks
>
> execute_service_checks=0
>
>
>
> # PASSIVE SERVICE CHECK ACCEPTANCE OPTION
> # This determines whether or not Nagios will accept passive
> # service checks results when it initially (re)starts.
> # Values: 1 = accept passive checks, 0 = reject passive checks
>
> accept_passive_service_checks=1
>
>
>
> # HOST CHECK EXECUTION OPTION
> # This determines whether or not Nagios will actively execute
> # host checks when it initially starts.  If this option is
> # disabled, checks are not actively made, but Nagios can still
> # receive and process passive check results that come in.  Unless
> # you're implementing redundant hosts or have a special need for
> # disabling the execution of host checks, leave this enabled!
> # Values: 1 = enable checks, 0 = disable checks
>
> execute_host_checks=1
>
> # PASSIVE HOST CHECK ACCEPTANCE OPTION
> # This determines whether or not Nagios will accept passive
> # host checks results when it initially (re)starts.
> # Values: 1 = accept passive checks, 0 = reject passive checks
>
> accept_passive_host_checks=0
>
> # OBSESS OVER SERVICE CHECKS OPTION
> # This determines whether or not Nagios will obsess over service
> # checks and run the ocsp_command defined below.  Unless you're
> # planning on implementing distributed monitoring, do not enable
> # this option.  Read the HTML docs for more information on
> # implementing distributed monitoring.
> # Values: 1 = obsess over services, 0 = do not obsess (default)
>
> obsess_over_services=0
>
>
>
> # OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND
> # This is the command that is run for every service check that is
> # processed by Nagios.  This command is executed only if the
> # obsess_over_services option (above) is set to 1.  The command
> # argument is the short name of a command definition that you
> # define in your host configuration file. Read the HTML docs for
> # more information on implementing distributed monitoring.
>
> #ocsp_command=somecommand
>
>
>
> # OBSESS OVER HOST CHECKS OPTION
> # This determines whether or not Nagios will obsess over host
> # checks and run the ochp_command defined below.  Unless you're
> # planning on implementing distributed monitoring, do not enable
> # this option.  Read the HTML docs for more information on
> # implementing distributed monitoring.
> # Values: 1 = obsess over hosts, 0 = do not obsess (default)
>
> obsess_over_hosts=0
>
>
>
> # OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND
> # This is the command that is run for every host check that is
> # processed by Nagios.  This command is executed only if the
> # obsess_over_hosts option (above) is set to 1.  The command
> # argument is the short name of a command definition that you
> # define in your host configuration file. Read the HTML docs for
> # more information on implementing distributed monitoring.
>
> #ochp_command=somecommand
>
> # SERVICE FRESHNESS CHECK OPTION
> # This option determines whether or not Nagios will periodically
> # check the "freshness" of service results.  Enabling this option
> # is useful for ensuring passive checks are received in a timely
> # manner.
> # Values: 1 = enabled freshness checking, 0 = disable freshness checking
>
> check_service_freshness=1
>
>
>
> # SERVICE FRESHNESS CHECK INTERVAL
> # This setting determines how often (in seconds) Nagios will
> # check the "freshness" of service check results.  If you have
> # disabled service freshness checking, this option has no effect.
>
> #service_freshness_check_interval=60
> service_freshness_check_interval=420
>
>
>
> # HOST FRESHNESS CHECK OPTION
> # This option determines whether or not Nagios will periodically
> # check the "freshness" of host results.  Enabling this option
> # is useful for ensuring passive checks are received in a timely
> # manner.
> # Values: 1 = enabled freshness checking, 0 = disable freshness checking
>
> check_host_freshness=0
> #check_host_freshness=1
>
>
>
> # HOST FRESHNESS CHECK INTERVAL
> # This setting determines how often (in seconds) Nagios will
> # check the "freshness" of host check results.  If you have
> # disabled host freshness checking, this option has no effect.
>
> #host_freshness_check_interval=60
> host_freshness_check_interval=420
>
> # ADDITIONAL FRESHNESS THRESHOLD LATENCY
> # This setting determines the number of seconds that Nagios
> # will add to any host and service freshness thresholds that
> # it calculates (those not explicitly specified by the user).
>
> #additional_freshness_latency=15
> additional_freshness_latency=180
>
>
> # LARGE INSTALLATION TWEAKS OPTION
> # This option determines whether or not Nagios will take some shortcuts
> # which can save on memory and CPU usage in large Nagios installations.
> # Read the documentation for more information on the benefits/tradeoffs
> # of enabling this option.
> # Values: 1 - Enabled tweaks
> #         0 - Disable tweaks (default)
>
> use_large_installation_tweaks=1
>
>
> # CHILD PROCESS MEMORY OPTION
> # This option determines whether or not Nagios will free memory in
> # child processes (processed used to execute system commands and host/
> # service checks).  If you specify a value here, it will override
> # program defaults.
> # Value: 1 - Free memory in child processes
> #        0 - Do not free memory in child processes
>
> #free_child_process_memory=1
>
> # CHILD PROCESS FORKING BEHAVIOR
> # This option determines how Nagios will fork child processes
> # (used to execute system commands and host/service checks).  Normally
> # child processes are fork()ed twice, which provides a very high level
> # of isolation from problems.  Fork()ing once is probably enough and will
> # save a great deal on CPU usage (in large installs), so you might
> # want to consider using this.  If you specify a value here, it will
> # program defaults.
> # Value: 1 - Child processes fork() twice
> #        0 - Child processes fork() just once
>
> #child_processes_fork_twice=1
> child_processes_fork_twice=0
>
>
> # DEBUG LEVEL
> # This option determines how much (if any) debugging information will
> # be written to the debug file.  OR values together to log multiple
> # types of information.
> # Values:
> #          -1 = Everything
> #          0 = Nothing
> #          1 = Functions
> #          2 = Configuration
> #          4 = Process information
> #          8 = Scheduled events
> #          16 = Host/service checks
> #          32 = Notifications
> #          64 = Event broker
> #          128 = External commands
> #          256 = Commands
> #          512 = Scheduled downtime
> #          1024 = Comments
> #          2048 = Macros
>
> debug_level=16
>
>
> # DEBUG VERBOSITY
> # This option determines how verbose the debug log out will be.
> # Values: 0 = Brief output
> #         1 = More detailed
> #         2 = Very detailed
>
> debug_verbosity=1
>
> Thanks in advance for your help.
> Trisha
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>



-- 
Cordially,
Shadhin Rahman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100506/281e82b9/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list