SIGSEGV on reload - incorrect freeing of string?

Ton Voon ton.voon at altinity.com
Tue Feb 20 22:24:35 CET 2007


Hi!

I wanted to check what people thought about this, because my  
knowledge of string handling in C is poor and this problem is hard to  
recreate.

We had a situation where nagios had a segfault. Here are the  
pertinent entries in nagios.log:

[1171976003] Caught SIGHUP, restarting...
[1171976010] HOST ALERT: hostB;DOWN;SOFT;1;CRITICAL - Plugin timed  
out after 10 seconds
[1171976010] SERVICE ALERT: hostB;TCP 2226;CRITICAL;SOFT;1;CRITICAL -  
Socket timeout after 10 seconds
[1171976010] SERVICE ALERT: hostA;TCP 2222;CRITICAL;SOFT;1;CRITICAL -  
Socket timeout after 10 seconds
[1171976011] Caught SIGSEGV, shutting down...
[1171976073] Nagios 2.5 starting... (PID=19542)

Note that there is a 7 second delay between the signal being caught  
and the host alerts, which is reasonable given the check_ping plugin  
was timing out after 10 seconds (there were actual network/host  
problems). So it is fair to assume that Nagios was in the host  
reachability logic.

Ethan made a change to Nagios in 2.5 where nagios exited out of the  
host check logic earlier (as this was slowing the restart). This  
includes the following code in checks.c:

-----

*** 2086,2089 ****
--- 2094,2107 ----
   		for(hst->current_attempt=1;hst- 
 >current_attempt<=max_check_attempts;hst->current_attempt++){
   			
+ 			/* ADDED 06/20/2006 EG */
+ 			/* bail out if signal encountered - use old state */
+ 			if(sigrestart==TRUE || sigshutdown==TRUE){
+ 				hst->current_attempt=1;
+ 				hst->current_state=old_state;
+ 				free(hst->plugin_output);
+ 				hst->plugin_output=(char *)old_plugin_output;
+ 				return hst->current_state;
+ 				}
+
   			/* check the host */
   			result=run_host_check(hst,check_options);
***************
*** 2172,2175 ****
--- 2190,2203 ----
   		for(hst->current_attempt=1;hst->current_attempt<=hst- 
 >max_attempts;hst->current_attempt++){

+ 			/* ADDED 06/20/2006 EG */
+ 			/* bail out if signal encountered - use old state */
+ 			if(sigrestart==TRUE || sigshutdown==TRUE){
+ 				hst->current_attempt=1;
+ 				hst->current_state=old_state;
+ 				free(hst->plugin_output);
+ 				hst->plugin_output=(char *)old_plugin_output;
+ 				return hst->current_state;
+ 				}
+
   			/* run the host check */
   			result=run_host_check(hst,check_options);


-----

I'm wondering if the "free(hst->plugin_output)" is the problem.  
Shouldn't this be a:

strcpy(hst->plugin_output, old_plugin_output);

instead?

When trying to recreate this, I created a host with an IP address  
that is not pingable, using check_ping as the host check command. I  
then submit a passive OK to this host and then schedule a check for  
all services immediately. After a few seconds (to allow nagios to get  
into the host check logic portion), I send a HUP signal to nagios.  
The logs show a Caught SIGHUP and a delay before the HOST ALERT.

I can't get nagios to segfault, but I get the plugin output set to  
funny characters, which suggests that plugin_output is not being  
correctly set in those routines. These corrupt characters do not  
appear if I change to a strcpy call. So it is possible that the  
segfault is happening somewhere else.

Any thoughts?

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list