SIGSEGV on reload - incorrect freeing of string?
Ethan Galstad
nagios at nagios.org
Thu Feb 22 19:46:43 CET 2007
Ton Voon wrote:
> Hi!
>
> I wanted to check what people thought about this, because my
> knowledge of string handling in C is poor and this problem is hard to
> recreate.
>
> We had a situation where nagios had a segfault. Here are the
> pertinent entries in nagios.log:
>
> [1171976003] Caught SIGHUP, restarting...
> [1171976010] HOST ALERT: hostB;DOWN;SOFT;1;CRITICAL - Plugin timed
> out after 10 seconds
> [1171976010] SERVICE ALERT: hostB;TCP 2226;CRITICAL;SOFT;1;CRITICAL -
> Socket timeout after 10 seconds
> [1171976010] SERVICE ALERT: hostA;TCP 2222;CRITICAL;SOFT;1;CRITICAL -
> Socket timeout after 10 seconds
> [1171976011] Caught SIGSEGV, shutting down...
> [1171976073] Nagios 2.5 starting... (PID=19542)
>
> Note that there is a 7 second delay between the signal being caught
> and the host alerts, which is reasonable given the check_ping plugin
> was timing out after 10 seconds (there were actual network/host
> problems). So it is fair to assume that Nagios was in the host
> reachability logic.
>
> Ethan made a change to Nagios in 2.5 where nagios exited out of the
> host check logic earlier (as this was slowing the restart). This
> includes the following code in checks.c:
>
> -----
>
> *** 2086,2089 ****
> --- 2094,2107 ----
> for(hst->current_attempt=1;hst-
> >current_attempt<=max_check_attempts;hst->current_attempt++){
>
> + /* ADDED 06/20/2006 EG */
> + /* bail out if signal encountered - use old state */
> + if(sigrestart==TRUE || sigshutdown==TRUE){
> + hst->current_attempt=1;
> + hst->current_state=old_state;
> + free(hst->plugin_output);
> + hst->plugin_output=(char *)old_plugin_output;
> + return hst->current_state;
> + }
> +
> /* check the host */
> result=run_host_check(hst,check_options);
> ***************
> *** 2172,2175 ****
> --- 2190,2203 ----
> for(hst->current_attempt=1;hst->current_attempt<=hst-
> >max_attempts;hst->current_attempt++){
>
> + /* ADDED 06/20/2006 EG */
> + /* bail out if signal encountered - use old state */
> + if(sigrestart==TRUE || sigshutdown==TRUE){
> + hst->current_attempt=1;
> + hst->current_state=old_state;
> + free(hst->plugin_output);
> + hst->plugin_output=(char *)old_plugin_output;
> + return hst->current_state;
> + }
> +
> /* run the host check */
> result=run_host_check(hst,check_options);
>
>
> -----
>
> I'm wondering if the "free(hst->plugin_output)" is the problem.
> Shouldn't this be a:
>
> strcpy(hst->plugin_output, old_plugin_output);
>
> instead?
>
> When trying to recreate this, I created a host with an IP address
> that is not pingable, using check_ping as the host check command. I
> then submit a passive OK to this host and then schedule a check for
> all services immediately. After a few seconds (to allow nagios to get
> into the host check logic portion), I send a HUP signal to nagios.
> The logs show a Caught SIGHUP and a delay before the HOST ALERT.
>
> I can't get nagios to segfault, but I get the plugin output set to
> funny characters, which suggests that plugin_output is not being
> correctly set in those routines. These corrupt characters do not
> appear if I change to a strcpy call. So it is possible that the
> segfault is happening somewhere else.
>
> Any thoughts?
>
> Ton
>
Good catch on this - I think the free() might be the problem. I just
added a NULL check before calling free() to fix that potential problem.
Also, I should have been using strdup() instead of referencing a
pointer to the old_plugin_output[] buffer. Patch is now in CVS.
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
More information about the Developers
mailing list