Tweaking Nagios Performance (Checks/Notifications)

Mirza Dedic mirde at oppy.com
Tue Oct 6 23:57:09 CEST 2009


I recently finished moving Nagios from a Virtual machine to bare-bone hardware, on a PowerEdge retired machine (dual-core, 4GB ram, raid-5 10k RPM HDs). My goal is to have a 1 minute window between when a host/service goes down and the time that I receive a message that it is down.

 We are monitoring a total of 347 Services and 82 Hosts, mainly using the plug-ins below:


-          Check_by_ssh

-          Check_nt (NSClient++ for Win32)

-          Check_http

-          Check_ping

-          Check_esx3

-          Check_mysql

Below are my "performance info" for the current setup:

Time Frame

Services Checked

<= 1 minute

65 (18.7%)

<= 5 minutes

300 (86.5%)

<= 15 minutes

347 (100.0%)

<= 1 hour

347 (100.0%)

Since program start

347 (100.0%)


Metric

Min.

Max.

Average

Check Execution Time

0.01 sec

21.91 sec

1.603 sec

Check Latency

0.00 sec

0.00 sec

0.164 sec

Percent State Change

0.00%

0.00%

0.00%

Services Passively Checked

Time Frame

Services Checked

<= 1 minute

0 (0.0%)

<= 5 minutes

0 (0.0%)

<= 15 minutes

0 (0.0%)

<= 1 hour

0 (0.0%)

Since program start

0 (0.0%)


Metric

Min.

Max.

Average

Percent State Change

0.00%

0.00%

0.00%

Hosts Actively Checked
Time Frame

Hosts Checked

<= 1 minute

0 (0.0%)

<= 5 minutes

78 (95.1%)

<= 15 minutes

82 (100.0%)

<= 1 hour

82 (100.0%)

Since program start

82 (100.0%)


Metric

Min.

Max.

Average

Check Execution Time

0.29 sec

4.03 sec

2.483 sec

Check Latency

0.15 sec

0.78 sec

0.565 sec

Percent State Change

0.00%

0.00%

0.00%



Hosts Passively Checked
Time Frame

Hosts Checked

<= 1 minute

0 (0.0%)

<= 5 minutes

0 (0.0%)

<= 15 minutes

0 (0.0%)

<= 1 hour

0 (0.0%)

Since program start

0 (0.0%)


Metric

Min.

Max.

Average

Percent State Change

0.00%

0.00%

0.00%



When I restart Nagios and monitoring the box, the total CPU consumption does not spike past 10%, so I would like to squeeze the checks tighter to use the additional resource available.

Below is my nagios.cfg for current setup: Can anyone suggest some changes that I could do to achieve the results wanted?

# MERLIN BROKER MODULE
broker_module=/usr/local/nagios/addons/merlin/merlin.so /usr/local/nagios/addons/merlin/merlin.conf
log_file=/usr/local/nagios/var/nagios.log

# localhost
cfg_file=/usr/local/nagios/etc/localhost.cfg

# Locations
cfg_dir=/usr/local/nagios/etc/locations

# Devices
cfg_dir=/usr/local/nagios/etc/devices

# Objects
cfg_dir=/usr/local/nagios/etc/objects

# OBJECT CACHE FILE

object_cache_file=/usr/local/nagios/var/objects.cache

# PRE-CACHED OBJECT FILE

precached_object_file=/usr/local/nagios/var/objects.precache

# RESOURCE FILE

resource_file=/usr/local/nagios/etc/resource.cfg

# STATUS FILE

status_file=/usr/local/nagios/var/status.dat

# STATUS FILE UPDATE INTERVAL

status_update_interval=5

# NAGIOS USER

nagios_user=nagios

# NAGIOS GROUP

nagios_group=nagios

# EXTERNAL COMMAND OPTION

check_external_commands=1

# EXTERNAL COMMAND CHECK INTERVAL

command_check_interval=-1

# EXTERNAL COMMAND FILE

command_file=/usr/local/nagios/var/rw/nagios.cmd

# EXTERNAL COMMAND BUFFER SLOTS

external_command_buffer_slots=4096

# LOCK FILE

lock_file=/usr/local/nagios/var/nagios.lock

# TEMP FILE

temp_file=/usr/local/nagios/var/nagios.tmp

# TEMP PATH

temp_path=/tmp

# EVENT BROKER OPTIONS

event_broker_options=-1

# LOG ROTATION METHOD

log_rotation_method=d

# LOG ARCHIVE PATH

log_archive_path=/usr/local/nagios/var/archives

# LOGGING OPTIONS

use_syslog=1

# NOTIFICATION LOGGING OPTION

log_notifications=1

# SERVICE RETRY LOGGING OPTION

log_service_retries=1

# HOST RETRY LOGGING OPTION

log_host_retries=1

# EVENT HANDLER LOGGING OPTION

log_event_handlers=1

# INITIAL STATES LOGGING OPTION

log_initial_states=1

# EXTERNAL COMMANDS LOGGING OPTION

log_external_commands=1

# PASSIVE CHECKS LOGGING OPTION

log_passive_checks=1

# SERVICE INTER-CHECK DELAY METHOD

service_inter_check_delay_method=s

# MAXIMUM SERVICE CHECK SPREAD

max_service_check_spread=5

# SERVICE CHECK INTERLEAVE FACTOR

service_interleave_factor=s

# HOST INTER-CHECK DELAY METHOD

host_inter_check_delay_method=s

# MAXIMUM HOST CHECK SPREAD

max_host_check_spread=3

# MAXIMUM CONCURRENT SERVICE CHECKS

max_concurrent_checks=0

# HOST AND SERVICE CHECK REAPER FREQUENCY

check_result_reaper_frequency=10

# MAX CHECK RESULT REAPER TIME

max_check_result_reaper_time=30

# CHECK RESULT PATH

check_result_path=/usr/local/nagios/var/spool/checkresults

# MAX CHECK RESULT FILE AGE

max_check_result_file_age=3600

# CACHED HOST CHECK HORIZON

cached_host_check_horizon=10

# CACHED SERVICE CHECK HORIZON

cached_service_check_horizon=10

# ENABLE PREDICTIVE HOST DEPENDENCY CHECKS

enable_predictive_host_dependency_checks=1

# ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS

enable_predictive_service_dependency_checks=1

# SOFT STATE DEPENDENCIES

soft_state_dependencies=0

#time_change_threshold=900

# AUTO-RESCHEDULING OPTION

auto_reschedule_checks=0

# AUTO-RESCHEDULING INTERVAL

auto_rescheduling_interval=30

# AUTO-RESCHEDULING WINDOW

auto_rescheduling_window=180

# SLEEP TIME

sleep_time=0.25

# TIMEOUT VALUES

service_check_timeout=30
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

# RETAIN STATE INFORMATION

retain_state_information=0

# STATE RETENTION FILE

state_retention_file=/usr/local/nagios/var/retention.dat

# RETENTION DATA UPDATE INTERVAL

retention_update_interval=5

# USE RETAINED PROGRAM STATE

use_retained_program_state=0

# USE RETAINED SCHEDULING INFO

use_retained_scheduling_info=0

# This mask determines what host attributes are not retained
retained_host_attribute_mask=0

# This mask determines what service attributes are not retained
retained_service_attribute_mask=0

# These two masks determine what process attributes are not retained.
# There are two masks, because some process attributes have host and service
# options.  For example, you can disable active host checks, but leave active
# service checks enabled.
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0

# These two masks determine what contact attributes are not retained.
# There are two masks, because some contact attributes have host and
# service options.  For example, you can disable host notifications for
# a contact, but leave service notifications enabled for them.
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

# INTERVAL LENGTH

interval_length=60

# CHECK FOR UPDATES

check_for_updates=0

# BARE UPDATE CHECK

bare_update_check=0

# AGGRESSIVE HOST CHECKING OPTION

use_aggressive_host_checking=0

# SERVICE CHECK EXECUTION OPTION

execute_service_checks=1

# PASSIVE SERVICE CHECK ACCEPTANCE OPTION

accept_passive_service_checks=1

# HOST CHECK EXECUTION OPTION

execute_host_checks=1

# PASSIVE HOST CHECK ACCEPTANCE OPTION

accept_passive_host_checks=1

# NOTIFICATIONS OPTION

enable_notifications=1

# EVENT HANDLER USE OPTION

enable_event_handlers=1

# PROCESS PERFORMANCE DATA OPTION

process_performance_data=1

# HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS

host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata

# HOST AND SERVICE PERFORMANCE DATA FILES

host_perfdata_file=/tmp/host-perfdata
service_perfdata_file=/tmp/service-perfdata

# HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES

host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$

# HOST AND SERVICE PERFORMANCE DATA FILE MODES

host_perfdata_file_mode=a
service_perfdata_file_mode=a

# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL

host_perfdata_file_processing_interval=0
service_perfdata_file_processing_interval=0

# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS

host_perfdata_file_processing_command=process-host-perfdata-file
service_perfdata_file_processing_command=process-service-perfdata-file

# OBSESS OVER SERVICE CHECKS OPTION

obsess_over_services=0

# OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND

#ocsp_command=somecommand

# OBSESS OVER HOST CHECKS OPTION

obsess_over_hosts=0

# OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND

#ochp_command=somecommand

# TRANSLATE PASSIVE HOST CHECKS OPTION

translate_passive_host_checks=0

# PASSIVE HOST CHECKS ARE SOFT OPTION

passive_host_checks_are_soft=0

# ORPHANED HOST/SERVICE CHECK OPTIONS

check_for_orphaned_services=1
check_for_orphaned_hosts=1

# SERVICE FRESHNESS CHECK OPTION

check_service_freshness=1

# SERVICE FRESHNESS CHECK INTERVAL

service_freshness_check_interval=60

# HOST FRESHNESS CHECK OPTION

check_host_freshness=1

# HOST FRESHNESS CHECK INTERVAL

host_freshness_check_interval=60

# ADDITIONAL FRESHNESS THRESHOLD LATENCY

additional_freshness_latency=15

# FLAP DETECTION OPTION

enable_flap_detection=1

# FLAP DETECTION THRESHOLDS FOR HOSTS AND SERVICES

low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0

# DATE FORMAT OPTION

date_format=us

# TIMEZONE OFFSET

#use_timezone=US/Mountain
#use_timezone=Australia/Brisbane

# P1.PL FILE LOCATION

p1_file=/usr/local/nagios/bin/p1.pl

# EMBEDDED PERL INTERPRETER OPTION

enable_embedded_perl=0

# EMBEDDED PERL USAGE OPTION

use_embedded_perl_implicitly=0

# ILLEGAL OBJECT NAME CHARACTERS

illegal_object_name_chars='

# ILLEGAL MACRO OUTPUT CHARACTERS

illegal_macro_output_chars='

# REGULAR EXPRESSION MATCHING

use_regexp_matching=0

# "TRUE" REGULAR EXPRESSION MATCHING

use_true_regexp_matching=0

# ADMINISTRATOR EMAIL/PAGER ADDRESSES

admin_email=mirde at oppy.com
admin_pager=mirde at oppy.com

# DAEMON CORE DUMP OPTION

daemon_dumps_core=1

# LARGE INSTALLATION TWEAKS OPTION

use_large_installation_tweaks=0

# ENABLE ENVIRONMENT MACROS

enable_environment_macros=1

# CHILD PROCESS MEMORY OPTION

#free_child_process_memory=1

# CHILD PROCESS FORKING BEHAVIOR

#child_processes_fork_twice=1

# DEBUG LEVEL

debug_level=-1

# DEBUG VERBOSITY

debug_verbosity=1

# DEBUG FILE

debug_file=/usr/local/nagios/var/nagios.debug

# MAX DEBUG FILE SIZE

max_debug_file_size=1000000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20091006/35d78895/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list