[Nagios-users] 2.0b5 initial host/service checks delayed after start (not present in 2.0b3)

Eli Stair estair at ilm.com
Fri Dec 9 02:00:45 CET 2005


Re-posting this here from 'nagios-users'.  This thread regarding Nagios 
taking several minutes after starting the daemon before it starts 
polling any services/hosts, during which time there is no CPU load from 
the process.

FYI, in the time it's taking to wait for nagios to start polling
anything after starting it up I decided to look at what it's doing...

This would explain why it starts up and sits around not consuming any
cycles but not polling.  Sleep left in the code?  These entries in the
log each come afer a few minutes (119 and 175 seconds apart) each..

This is running on 2.0b6, x86_64 arch, compiled from source with perlcache.

/eli

###FILE: nagios.log:
[1134076786] Finished daemonizing... (New PID=11914)
[1134076905] service_result_worker_thread(): poll(): EINTR (impossible)
[1134077080] service_result_worker_thread(): poll(): EINTR (impossible)


### GDB info:
Attaching to program: /usr/local/nagios/bin/nagios, process 11914
Reading symbols from
/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/CORE/libperl.so...(no
debugging symbols found)...done.
Loaded symbols for
/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/CORE/libperl.so
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libutil.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/tls/libpthread.so.0...
(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 182894164416 (LWP 11914)]
[New Thread 1094719840 (LWP 11917)]
[New Thread 1084229984 (LWP 11915)]
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/tls/libc.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /usr/lib64/libltdl.so.3...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libltdl.so.3
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000364700b9c5 in __nanosleep_nocancel ()
    from /lib64/tls/libpthread.so.0

(gdb) where
#0  0x000000364700b9c5 in __nanosleep_nocancel () from
/lib64/tls/libpthread.so.0
#1  0x00000000004209aa in event_execution_loop ()
#2  0x000000000040efa0 in main ()

(gdb) info registers
rax            0xfffffffffffffdfc       -516
rbx            0x861bb0 8788912
rcx            0xffffffffffffffff       -1
rdx            0x2      2
rsi            0x0      0
rdi            0x7fbffff450     548682069072
rbp            0x0      0x0
rsp            0x7fbffff410     0x7fbffff410
r8             0x0      0
r9             0x2e8a   11914
r10            0x7fbffff301     548682068737
r11            0x202    514
r12            0x7fbffff450     548682069072
r13            0xffffffff       4294967295
r14            0xffffffff       4294967295
r15            0x7fbffffa08     548682070536
rip            0x364700b9c5     0x364700b9c5 <__nanosleep_nocancel+60>
eflags         0x202    514
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0


Fred wrote:
> I do the same thing with check_icmp except that I use sudo and create
> a simple sudo entry like (see the CHECK_ICMP):
> 
> Cmnd_Alias CHECKALLSSHKEYS = /opt/hptc/nagios/libexec/check_keys # 
> HP-HPTC-KeySync
> Cmnd_Alias CHECKSYSLOGALERTS = 
> /opt/hptc/nagios/libexec/check_syslogalerts # HP-HPTC-SysLog
> Cmnd_Alias CHECKSFS = /opt/hptc/nagios/libexec/check_sfs # HP-HPTC-SysLog
> Cmnd_Alias CHECKLSF = /opt/hptc/nagios/libexec/check_lsf # HP-HPTC-CheckLSF
> Cmnd_Alias CHECKICMP = /opt/hptc/nagios/libexec/check_icmp # 
> HP-HPTC-CheckICMP
> nagios ALL = NOPASSWD:  
> CHECKALLSSHKEYS,CHECKSYSLOGALERTS,CHECKSFS,CHECKLSF,CHECKICMP # 
> HP-HPTC-Nagios
> 
> I just built the 2.0b5 and hope to give it a try in the next few days on a
> 700+ node system ... I am hoping that this *solves* the delay problem
> that existed in the previous releases.
> 
> -FredC
> 
> 
> */Eli Stair <estair at ilm.com>/* wrote:
> 
> 
>     I'm running a fresh build of 2.0b5 on x86_64. After an initial start of
>     nagios, it can take up to 10 minutes for the first host or service
>     checks to begin. There is no CPU load by the nagios process during this
>     time. I have over 1000 hosts to check, and have reduced the max
>     host/service check spread in order to ensure that it is not "evening"
>     out the time.
> 
>     This problem is NOT occuring on a 2.0b3 build, with the same exact
>     configuration.
> 
>     After the checks DO start, it can take hours to finish. I've changed
>     the user to root so that I can have the host check be check_icmp -t
>     1 -p
>     1.
> 
>     Unfortunately, even with this situation, having anywhere between 4 and
>     64 hosts go down can make the "monitoring" aspect effectively useless.
> 
>     Any suggestions on the problem of startup lag?
>     Any ways to further speed up the host check runs, aside from using
>     check_icmp?
> 
>     Thanks,
> 
>     /eli
> 
>     ### inline nagios.cfg:
> 
> 
>     [root at monitor02 etc]# cat nagios.cfg | egrep -v "^#|^$"
>     log_file=/var/log/nagios/nagios.log
>     cfg_file=/usr/local/nagios/etc/checkcommands.cfg
>     cfg_file=/usr/local/nagios/etc/misccommands.cfg
>     cfg_dir=/usr/local/nagios/etc/config
>     cfg_file=/usr/local/nagios/etc/timeperiods.cfg
>     cfg_file=/usr/local/nagios/etc/contacts.cfg
>     cfg_file=/usr/local/nagios/etc/contactgroups.cfg
>     cfg_file=/usr/local/nagios/etc/hosts.cfg
>     cfg_file=/usr/local/nagios/etc/hostgroups.cfg
>     cfg_file=/usr/local/nagios/etc/customcommands.cfg
>     cfg_file=/usr/local/nagios/etc/services.cfg
>     object_cache_file=/usr/local/nagios/var/objects.cache
>     resource_file=/usr/local/nagios/etc/resource.cfg
>     status_file=/usr/local/nagios/var/status.dat
>     nagios_user=root
>     nagios_group=root
>     check_external_commands=1
>     command_check_interval=-1
>     command_file=/usr/local/nagios/var/rw/nagios.cmd
>     comment_file=/usr/local/nagios/var/comments.dat
>     downtime_file=/usr/local/nagios/var/downtime.dat
>     lock_file=/usr/local/nagios/var/nagios.lock
>     temp _file=/usr/local/nagios/var/nagios.tmp
>     event_broker_options=-1
>     log_rotation_method=d
>     log_archive_path=/var/log/nagios/archives
>     use_syslog=1
>     log_notifications=1
>     log_service_retries=1
>     log_host_retries=1
>     log_event_handlers=1
>     log_initial_states=0
>     log_external_commands=1
>     log_passive_checks=1
>     service_inter_check_delay_method=s
>     max_service_check_spread=15
>     service_interleave_factor=s
>     host_inter_check_delay_method=s
>     max_host_check_spread=10
>     max_concurrent_checks=0
>     service_reaper_frequency=15
>     auto_reschedule_checks=0
>     auto_rescheduling_interval=30
>     auto_rescheduling_window=180
>     sleep_time=0.25
>     service_check_timeout=60
>     host_check_timeout=30
>     event_handler_timeout=30
>     notification_timeout=30
>     ocsp_timeout=5
>     perfdata_timeout=5
>     retain_state_information=1
>     state_retention_file=/usr/local/nagios/var/retention.dat
>     retention_update_interval=0
>     use_retained_program_state=1
>     use_retained_scheduling_info=0
>     interv al_length=60
>     use_aggressive_host_checking=0
>     execute_service_checks=1
>     accept_passive_service_checks=0
>     execute_host_checks=1
>     accept_passive_host_checks=1
>     enable_notifications=1
>     enable_event_handlers=1
>     process_performance_data=0
>     obsess_over_services=0
>     check_for_orphaned_services=0
>     check_service_freshness=1
>     service_freshness_check_interval=60
>     check_host_freshness=1
>     host_freshness_check_interval=60
>     aggregate_status_updates=1
>     status_update_interval=15
>     enable_flap_detection=0
>     low_service_flap_threshold=5.0
>     high_service_flap_threshold=20.0
>     low_host_flap_threshold=5.0
>     high_host_flap_threshold=20.0
>     date_format=iso8601
>     illegal_object_name_chars=`~!$%^&*|'"<>?,()=
>     illegal_macro_output_chars=`~$&|'"<>
>     use_regexp_matching=0
>     use_true_regexp_matching=0
>     admin_email=nagios
>     admin_pager=pagenagios
>     daemon_dumps_core=0
> 
> 
> 
>     -------------------------------------------------------
>     This SF.net email is sponsored by: Splunk Inc. Do you grep through
>     log files
>     for problems? Stop! Download the new AJAX search engine that makes
>     searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
>     http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
>     _______________________________________________
>     Nagios-users mailing list
>     Nagios-users at lists.sourceforge.net
>     https://lists.sourceforge.net/lists/listinfo/nagios-users
>     ::: Please include Nagios version, plugin version (-v) and OS when
>     reporting any issue.
>     ::: Messages without supporting info will risk being sent to /dev/null
> 
> 
> 
> 




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click




More information about the Developers mailing list