bug: unlocking an invalid mutex

Ethan Galstad nagios at nagios.org
Thu Oct 18 19:14:10 CEST 2007


Geert Hendrickx wrote:
> Hi,
> 
> I tried to upgrade a Nagios 2.5 system running on NetBSD to Nagios 2.9.
> But it seems like a mutex bug has been introduced in Nagios 2.7 (I can
> reproduce it with Nagios 2.7 but not with 2.5 and 2.6).
> 
> Unlike Linux, NetBSD's pthread implementation is quite unforgiving for
> mutex errors, and aborts a running program e.g. when it tries to unlock
> an invalid mutex.  This is what is happening with Nagios:
> 
>> Nagios 2.9 starting... (PID=17620)
>> nagios: Error detected by libpthread: Invalid mutex.
>> Detected by file "/cvs/src/3/lib/libpthread/pthread_mutex.c", line 334, function "pthread_mutex_unlock".
>> See pthread(3) for information.
>>
>> Program received signal SIGABRT, Aborted.
>> [Switching to LWP 1]
>> 0xbd9e921f in kill () from /usr/lib/libc.so.12
>> (gdb) bt
>> #0  0xbd9e921f in kill () from /usr/lib/libc.so.12
>> #1  0xbdaa6fb6 in pthread__errorfunc () from /usr/lib/libpthread.so.0
>> #2  0xbdaa3d4b in pthread_mutex_unlock () from /usr/lib/libpthread.so.0
>> #3  0x080a1651 in xsddefault_save_status_data () at ../xdata/xsddefault.c:338
>> #4  0x080a10bd in update_all_status_data () at ../common/statusdata.c:93
>> #5  0x080544dc in main (argc=2, argv=0xbfbfe8b8, env=0xbfbfe8c4) at nagios.c:665
>> #6  0x0805377d in ___start ()
>> (gdb)
> 
> The problem is probably in this change between Nagios 2.6 and 2.7:
> 
> --- xdata/xsddefault.c	2006-05-20 21:39:34.000000000 +0200
> +++ xdata/xsddefault.c	2007-01-03 03:50:43.000000000 +0100
> @@ -322,6 +331,18 @@
>  		return ERROR;
>  	        }
>  
> +	/* get number of items in the check result buffer */
> +	pthread_mutex_lock(&service_result_buffer.buffer_lock);
> +	used_check_result_buffer_slots=service_result_buffer.items;
> +	high_check_result_buffer_slots=service_result_buffer.high;
> +	pthread_mutex_unlock(&service_result_buffer.buffer_lock);
> +
> +	/* get number of items in the command buffer */
> +	pthread_mutex_lock(&external_command_buffer.buffer_lock);
> +	used_external_command_buffer_slots=external_command_buffer.items;
> +	high_external_command_buffer_slots=external_command_buffer.high;
> +	pthread_mutex_unlock(&external_command_buffer.buffer_lock);
> +
>  	/* write version info to status file */
>  	fprintf(fp,"########################################\n");
>  	fprintf(fp,"#          NAGIOS STATUS FILE\n");
> 
> 
> Can this please be looked into?  Do I need to provide more information?
> 
> Thanks,
> 
> 	Geert
> 
> 
> PS: please keep me Cc'd.
> 

Did you by chance have external commands disabled when you got the SIGABRT?

I believe the problem was due to the external_command_buffer.buffer_lock 
mutex being accessed even in external commands were disabled (in which 
case the mutex wouldn't exist).

A fix has been committed to CVS (both the 2.x and HEAD branches).  When 
you get a chance, test the new 2.x CVS code and see if it solves the 
problem.


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list