alternative scheduler

Fredrik Thulin ft at it.su.se
Sat Nov 20 14:22:22 CET 2010


Hi

I've had problems with my distributed Nagios installation. The slave
server just wouldn't start enough checks per time unit, and I think I
spent two full days searching for answers and testing every
configuration change others in the same situation had reported success
with (the configuration I ended up with is at the bottom of this mail).

I set up MRTG monitoring, and saw something incredibly strange - Nagios
would start out with quite a few services per minute, but then over time
level out at something like 50 services per minute (!), until restarted
again.

http://people.su.se/~ft/test/mrtg_nagios-dev-srv1_2010-11-19/nagios-f.html

Had I been a C programming wizard, I surely would have tried to find the
problem in Nagios and fix it, but I'm not. However I DO know how to
program massively concurrent things in Erlang, so I created a proof of
concept Nagios scheduler in about 350 lines of Erlang code (not
kidding).

Thanks to the excellent modular design in Nagios, this was very simple.
My application just invokes another service checks every 50ms or so, and
sends the result to the Nagios master using a classic send_result
shell-script (NCSA).

Look at these new graphs

  http://people.su.se/~ft/test/mrtg_nagios-dev-srv1/nagios-f.html

Quite an improvement. Executing around 5800 service checks per five
minutes, on a very low end server I'd say.

To be fair, at around 11:00 I shut down the virtual machine and gave it
another CPU (now 2) and another 256 MB of RAM (total 512).

Anyone interested in further development of this? I've released it under
a BSD license, and you can find the code at 

  https://github.com/fredrikt/nagios-pers

/Fredrik

PS. Main configuration on my distributed servers :

log_file=/local/nagios/local/nagios.log
cfg_file=/etc/nagios3/commands.cfg
cfg_dir=/etc/nagios-plugins/config
cfg_dir=/etc/nagios3/conf.d
cfg_file=/local/nagios/approved/cfg/checkcommands.cfg
cfg_file=/local/nagios/approved/cfg/misccommands.cfg
cfg_file=/local/nagios/approved/cfg/distributed_servers/su-templates.cfg
cfg_file=/local/nagios/approved/local/su-hosts.cfg
object_cache_file=/local/nagios/var/objects.cache
precached_object_file=/local/nagios/var/objects.precache
resource_file=/etc/nagios3/resource.cfg
status_file=/local/nagios/var/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=0
command_check_interval=-1
command_file=/local/nagios/var/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/var/run/nagios3/nagios3.pid
temp_file=/local/nagios/var/nagios.tmp
temp_path=/tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/local/nagios/local/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=15
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=15
max_concurrent_checks=500
check_result_reaper_frequency=10
max_check_result_reaper_time=30
check_result_path=/local/nagios/var/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/local/nagios/var/retention.dat
retention_update_interval=60
use_retained_program_state=0
use_retained_scheduling_info=0
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
check_for_updates=0
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=0
accept_passive_host_checks=1
enable_notifications=0
enable_event_handlers=0
process_performance_data=0
obsess_over_services=1
ocsp_command=send_result
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=0
service_freshness_check_interval=60
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=iso8601
p1_file=/usr/lib/nagios3/p1.pl
enable_embedded_perl=1
use_embedded_perl_implicitly=1
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=root at localhost
admin_pager=pageroot at localhost
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=0
free_child_process_memory=0
debug_level=0
debug_verbosity=1
debug_file=/local/nagios/local/nagios.debug
max_debug_file_size=1000000



------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev




More information about the Developers mailing list