Nagios Performance Issues

Babak Pasdar bpasdar at pasdar.com
Wed Feb 18 00:35:08 CET 2004


Dear Nagios Community Members,

We have recently implemented Nagios 1.2.  We have a total of 620 services on 200
hosts being monitored.  Services include network port monitoring, snmp checks via
snmpget and server monitoring via NRPE.

Our challenge is that Nagios is slow (Extremenly Slow) on sheduled checks.  For
example at 11:00pm the nagios scheduling queue is still waiting to process scheduled
service checks from 7:00pm.  Also it takes so long for services or hosts which are
up to be recognized, probably as a by product of the above.

Now the system, a dual 2.4 Ghz. P4 Xeon with hyperthreading and 1 Gig Ram always
seems to have plenty of idle CPU, but sometimes the load level goes as high as an
insane 1300 with extremely high process counts (7000 or more), when we set the
checks to be more aggressive.  What I dont understand is that at a load level of
1300 the system shows 50% or more idle CPU.

How can we get more aggressive checking and eliminate the backlog of checks without
driving the sytem loads and process count so high.  In particular what settings
should we play with for more consistent performance.

My guess is that we have to play with the following:

inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=0
service_reaper_frequency=30
sleep_time=1

Am I right?  If so what formula does Nagios use to do it's smart checking?

Thank you for your help,

babak


nagios.cfg setting that is very slow to update:

log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/dependencies.cfg
cfg_file=/usr/local/nagios/etc/escalations.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.log
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=2
command_file=/usr/local/nagios/var/rw/nagios.cmd
comment_file=/usr/local/nagios/var/comment.log
downtime_file=/usr/local/nagios/var/downtime.log
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_service_checks=1
inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=0
service_reaper_frequency=30
sleep_time=1
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=60
notification_timeout=60
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/status.sav
retention_update_interval=60
use_retained_program_state=1
interval_length=30
use_agressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=0
enable_notifications=1
enable_event_handlers=0
process_performance_data=0
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
obsess_over_services=0
check_for_orphaned_services=0
check_service_freshness=0
freshness_check_interval=60
aggregate_status_updates=1
status_update_interval=30
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
admin_email=vigilant
admin_pager=vigilant
# EOF (End of file)


nagios.cfg setting that generates extremely high load levels and high number of
processes.

log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/dependencies.cfg
cfg_file=/usr/local/nagios/etc/escalations.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.log
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=15s
command_file=/usr/local/nagios/var/rw/nagios.cmd
comment_file=/usr/local/nagios/var/comment.log
downtime_file=/usr/local/nagios/var/downtime.log
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=1
log_external_commands=1
log_passive_service_checks=1
inter_check_delay_method=4
service_interleave_factor=2
max_concurrent_checks=0
service_reaper_frequency=30
sleep_time=1
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=60
notification_timeout=60
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/status.sav
retention_update_interval=60
use_retained_program_state=1
interval_length=60
use_agressive_host_checking=1
execute_service_checks=1
accept_passive_service_checks=0
enable_notifications=1
enable_event_handlers=0
process_performance_data=0
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
obsess_over_services=0
check_for_orphaned_services=1
check_service_freshness=0
freshness_check_interval=60
aggregate_status_updates=1
status_update_interval=30
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
admin_email=vigilant
admin_pager=vigilant
# EOF (End of file)

-- 





-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list