Multiple orphaned nagios processes

Brian Murphy brian.murphy at gmx.net
Fri Aug 12 03:57:06 CEST 2005


Hi There

Running nagios is a distributed setup, 2 systems carrying out active checks
and sending results to central display node using nsca.

What we are seeing is that we end up with hundreds of nagios processes on
the central node, enough to grind it to a stop. Over 2000 checks are being
carried out at the checking nodes.

We set the  service_reaper_frequency to 3 on both the central and the
chicking systems and still have the problem.

We had this problem on 2.0b3 and still have it on 2.0b4

I am suspecting that the processes are spawned off to process the passive
checks, but collide writing into the pipe back to nagios (or it is full and
EAGAIN) and end up eventually just orphaned out there.

How many checks per second should nagios be able to process? We seem to be
writing to the logfile at 100/sec sometimes.

Do I just need to slow things down somehow? bigger poll cycles with the
checks...

We are running a perfdata command and an ocsp_command on the central host,
processes forked by these do not seem to be the problem.


Config file bits below

status_file=/usr/local/nagios/var/status.dat
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
comment_file=/usr/local/nagios/var/comments.dat
downtime_file=/usr/local/nagios/var/downtime.dat
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
service_reaper_frequency=3
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25 
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=0
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
service_perfdata_command=process-service-perfdata
obsess_over_services=1
ocsp_command=nagios-data-logger
check_for_orphaned_services=0
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=0
host_freshness_check_interval=60
aggregate_status_updates=1
status_update_interval=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
p1_file=/usr/local/nagios/bin/p1.pl
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=nagios
admin_pager=pagenagios
daemon_dumps_core=1

Any suggestions appreciated

Thanx

Brian

-- 
5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
+++ GMX - die erste Adresse für Mail, Message, More +++


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list