nagios core dump and restart when check_nrpe‏

Kelvin Xu kyoxu at hotmail.com
Wed Jan 9 10:54:12 CET 2008


Hi all, I have just installed Nagios 3.0rc1 into a Solaris 10 machine. Everything is working fine except when i tried to do a check_nrpe on a remote host or localhost. I check my /var/adm/messages. Below is a section of the output: 

Jan  4 10:16:39 pnsgsit1gw1 nagios[263]: [ID 702911 user.info] Caught SIGTERM, shutting down...Jan  4 10:16:39 pnsgsit1gw1 nagios[263]: [ID 702911 user.info] Successfully shutdown... (PID=263)Jan  4 10:16:39 pnsgsit1gw1 nagios[290]: [ID 702911 user.info] Nagios 3.0rc1 starting... (PID=290)Jan  4 10:16:39 pnsgsit1gw1 nagios[290]: [ID 702911 user.info] Local time is Fri Jan 04 10:16:39 SGT 2008Jan  4 10:16:39 pnsgsit1gw1 nagios[290]: [ID 702911 user.info] LOG VERSION: 2.0Jan  4 10:16:39 pnsgsit1gw1 nagios[291]: [ID 702911 user.info] Finished daemonizing... (New PID=291)Jan  4 10:17:53 pnsgsit1gw1 genunix: [ID 603404 kern.notice] NOTICE: core_log: nagios[302] setid process, core not dumped: /var/core/core.nagios.302.pnsgsit1gw1.210033.65541.1199413073Jan  4 10:17:53 pnsgsit1gw1 nagios[291]: [ID 702911 user.info] Caught SIGTERM, shutting down...Jan  4 10:17:53 pnsgsit1gw1 nagios[291]: [ID 702911 user.info] Successfully shutdown... (PID=291)Jan  4 10:17:53 pnsgsit1gw1 nagios[305]: [ID 702911 user.info] Nagios 3.0rc1 starting... (PID=305)Jan  4 10:17:53 pnsgsit1gw1 nagios[305]: [ID 702911 user.info] Local time is Fri Jan 04 10:17:53 SGT 2008Jan  4 10:17:53 pnsgsit1gw1 nagios[305]: [ID 702911 user.info] LOG VERSION: 2.0Jan  4 10:17:53 pnsgsit1gw1 nagios[306]: [ID 702911 user.info] Finished daemonizing... (New PID=306)
 
This will repeat every few minutes and will not occur when i remove the nrpe service monitoring from the configuration. I tried to do a /usr/local/nagios/libexec/check_nrpe -H pnsgsit1gw2 -c check_load, The output seems fine except there is some addition characters appended to end. OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;ÿ¿àpÿ: Below is the debug log that i extracted. It seems that the nagios just core dump when a check_nrpe request is sent out and a new process is created:
 
1199869965.255643] [064.1] [pid=720] Making callbacks (type 13)...
[1199869965.255659] [016.0] [pid=720] Checking service 'NRPE' on host 'pnsgsit1web2a'...
[1199869965.255752] [001.0] [pid=720] get_raw_command_line()
[1199869965.255774] [2320.2] [pid=720] Raw Command Input: $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
[1199869965.255792] [001.0] [pid=720] process_macros()
[1199869965.255808] [2048.1] [pid=720] **** BEGIN MACRO PROCESSING ***********
[1199869965.255822] [2048.1] [pid=720] Processing: 'check_load'
[1199869965.255836] [2048.2] [pid=720] Processing part: 'check_load'
[1199869965.255851] [2048.2] [pid=720] Not currently in macro. Running output (10): 'check_load'
[1199869965.255866] [2048.1] [pid=720] Done. Final output: 'check_load'
[1199869965.255879] [2048.1] [pid=720] **** END MACRO PROCESSING *************
[1199869965.255892] [2320.2] [pid=720] Expanded Command Output: $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
[1199869965.255905] [001.0] [pid=720] process_macros()
[1199869965.255919] [2048.1] [pid=720] **** BEGIN MACRO PROCESSING ***********
[1199869965.255931] [2048.1] [pid=720] Processing: '$USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$'
[1199869965.255945] [2048.2] [pid=720] Processing part: ''
[1199869965.255958] [2048.2] [pid=720] Not currently in macro. Running output (0): ''
[1199869965.255971] [2048.2] [pid=720] Processing part: 'USER1'
[1199869965.256010] [2048.2] [pid=720] Uncleaned macro. Running output (25): '/usr/local/nagios/libexec'
[1199869965.256025] [2048.2] [pid=720] Just finished macro. Running output (25): '/usr/local/nagios/libexec'
[1199869965.256039] [2048.2] [pid=720] Processing part: '/check_nrpe -H '
[1199869965.256054] [2048.2] [pid=720] Not currently in macro. Running output (40): '/usr/local/nagios/libexec/check_nrpe -H '
[1199869965.256068] [2048.2] [pid=720] Processing part: 'HOSTADDRESS'
[1199869965.256088] [2048.2] [pid=720] Uncleaned macro. Running output (52): '/usr/local/nagios/libexec/check_nrpe -H 10.106.65.18'
[1199869965.256103] [2048.2] [pid=720] Just finished macro. Running output (52): '/usr/local/nagios/libexec/check_nrpe -H 10.106.65.18'
[1199869965.256118] [2048.2] [pid=720] Processing part: ' -c '
[1199869965.256132] [2048.2] [pid=720] Not currently in macro. Running output (56): '/usr/local/nagios/libexec/check_nrpe -H 10.106.65.18 -c '
[1199869965.256218] [2048.2] [pid=720] Processing part: 'ARG1'
[1199869965.256245] [2048.2] [pid=720] Uncleaned macro. Running output (66): '/usr/local/nagios/libexec/check_nrpe -H 10.106.65.18 -c check_load'
[1199869965.256260] [2048.2] [pid=720] Just finished macro. Running output (66): '/usr/local/nagios/libexec/check_nrpe -H 10.106.65.18 -c check_load'
[1199869965.256274] [2048.2] [pid=720] Processing part: ''
[1199869965.256288] [2048.2] [pid=720] Not currently in macro. Running output (66): '/usr/local/nagios/libexec/check_nrpe -H 10.106.65.18 -c check_load'
[1199869965.256302] [2048.1] [pid=720] Done. Final output: '/usr/local/nagios/libexec/check_nrpe -H 10.106.65.18 -c check_load'
[1199869965.256316] [2048.1] [pid=720] **** END MACRO PROCESSING *************
[1199869965.256595] [016.1] [pid=720] Check result output will be written to '/usr/local/nagios/var/spool/checkresults/checkCmaaAb' (fd=9)
[1199869965.256737] [064.1] [pid=720] Making callbacks (type 13)...
[1199869965.257854] [016.2] [pid=720] Service check is executing in child process (pid=758)
[1199869965.260733] [001.0] [pid=758] process_macros()
[1199869965.260821] [001.0] [pid=758] process_macros()
[1199869965.260852] [001.0] [pid=758] process_macros()
[1199869965.260879] [001.0] [pid=758] process_macros()
[1199869965.260907] [001.0] [pid=758] process_macros()
[1199869965.260934] [001.0] [pid=758] process_macros()
[1199869965.267584] [001.0] [pid=720] handle_timed_event() end
[1199869965.267647] [008.1] [pid=720] ** Event Check Loop
[1199869965.267718] [008.1] [pid=720] Next High Priority Event Time: Wed Jan 9 17:12:52 2008
[1199869965.267742] [008.1] [pid=720] Next Low Priority Event Time: Wed Jan 9 17:14:32 2008
[1199869965.256737] [064.1] [pid=720] Making callbacks (type 13)...
[1199869965.257854] [016.2] [pid=720] Service check is executing in child process (pid=758)
[1199869965.260733] [001.0] [pid=758] process_macros()
[1199869965.260821] [001.0] [pid=758] process_macros()
[1199869965.260852] [001.0] [pid=758] process_macros()
[1199869965.260879] [001.0] [pid=758] process_macros()
[1199869965.260907] [001.0] [pid=758] process_macros()
[1199869965.260934] [001.0] [pid=758] process_macros()
[1199869965.267584] [001.0] [pid=720] handle_timed_event() end
[1199869965.267647] [008.1] [pid=720] ** Event Check Loop
[1199869965.267718] [008.1] [pid=720] Next High Priority Event Time: Wed Jan 9 17:12:52 2008
[1199869965.267742] [008.1] [pid=720] Next Low Priority Event Time: Wed Jan 9 17:14:32 2008
[1199869965.267758] [008.1] [pid=720] Current/Max Service Checks: 1/0
[1199869965.267773] [008.2] [pid=720] No events to execute at the moment. Idling for a bit...
[1199869965.267788] [001.0] [pid=720] check_for_external_commands()
[1199869965.267806] [064.1] [pid=720] Making callbacks (type 8)...
[1199869965.302735] [001.0] [pid=720] event_execution_loop() end
[1199869965.303213] [064.1] [pid=720] Making callbacks (type 9)...
[1199869965.303244] [064.1] [pid=720] Making callbacks (type 7)...
[1199869965.303260] [064.1] [pid=720] Making callbacks (type 7)...
[1199869965.303276] [064.1] [pid=720] Making callbacks (type 26)...
[1199869965.303291] [001.0] [pid=720] xrddefault_save_state_information()
[1199869965.303480] [004.2] [pid=720] Writing retention data to temp file '/usr/local/nagios/var/nagios.tmpDmaaAb'
[1199869965.325858] [064.1] [pid=720] Making callbacks (type 26)...
[1199869965.350393] [064.1] [pid=720] Making callbacks (type 9)...
[1199869965.404567] [001.0] [pid=762] drop_privileges() start
[1199869965.404797] [004.0] [pid=762] Original UID/GID: 0/0
[1199869965.453908] [004.0] [pid=762] New UID/GID: 210033/65541
[1199869965.454562] [064.1] [pid=762] Making callbacks (type 9)...
[1199869965.454874] [064.1] [pid=762] Making callbacks (type 9)...
[1199869965.455046] [064.1] [pid=762] Making callbacks (type 9)...
[1199869965.455064] [064.1] [pid=762] Making callbacks (type 7)...
[1199869965.462889] [064.1] [pid=762] Making callbacks (type 7)...
[1199869965.465180] [064.1] [pid=763] Making callbacks (type 7)...
[1199869965.465827] [064.1] [pid=763] Making callbacks (type 9)...
[1199869965.482936] [064.1] [pid=763] Making callbacks (type 26)...
[1199869965.482993] [001.0] [pid=763] xrddefault_read_state_information() start
[1199869965.483484] [064.1] [pid=763] Making callbacks (type 19)...
 
Anyone has any ideas of what could be the problem? Has anyone succeeded in using nagios 3.0rc1 on Solaris 10? Thanks Regards,Kelvin Xu
_________________________________________________________________
Get your free suite of Windows Live services today!
http://www.get.live.com/wl/all
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20080109/8e5114d5/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list