Event broker, dlopen(), and segfaults

Ethan Galstad nagios at nagios.org
Fri Oct 19 17:19:05 CEST 2007


Roy and Andreas -

Thanks for your insight.  I found this article about HP-UX libraries and 
it seems to indicate that deleting the original file and replacing it 
with a new one will prevent a segfault.  Simply overwriting the file 
will cause a segfault, as the inode doesn't change:

http://www.sap-basis-abap.com/unix/replacing-libraries-on-hp-ux.htm

Hardly ideal.  The only real workaround would be to stat() the file to 
check to mtime changes before each and every call to a function within 
the module.   However, the overhead of doing so is too great to make it 
a feasible option...

I'll make a note in the docs about this.

Marantz, Roy wrote:
> This is usually caused by updating the contents of the file instead of
> replacing it.  i.e. getting a new inode might make this safe.  
> You could try write to FILE.new; mv FILE.new FILE to force the new file
> to get a new inode.  This might vary by OS or even OS version.
> Roy
> 
> -----Original Message-----
> From: nagios-devel-bounces at lists.sourceforge.net
> [mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf Of Andreas
> Ericsson
> Sent: Friday, October 19, 2007 3:26 AM
> To: nagios at nagios.org; Nagios Developers List
> Subject: Re: [Nagios-devel] Event broker, dlopen(), and segfaults
> 
> Ethan Galstad wrote:
>> While doing some debugging of NDOUtils, I've noticed something bad. 
>> Event broker modules like ndomod.o will cause Nagios to segfault if
> they 
>> are overwritten on the filesystem while they are in use.
>>
>> I assume this is due to the way dlopen() deals with object files.  I
> was 
>> under the assumption that a complete copy of the module was kept in 
>> memory once it was loaded, but perhaps its mmap()'d.
>>
>> The segfault is easily reproducible every time I overwrite ndomod.o 
>> while in use.  Even if the "new" version of the file doesn't differ
> from 
>> the old.
>>
>> Anyone know more details of how this works, or better yet, how to 
>> avoid/deal with it?
>>
> 
> When a program still has a descriptor to the file, the kernel retains
> the
> diskblocks pointed to until that descriptor is made invalid (ie,
> close()'d).
> 
> I just tested this with modules though, and it doesn't work.
> 
> Tested locking the file too, and that didn't work either.
> 
> Hmm... The only way out I see is to copy the file to a different
> directory
> and loading it from there, but I'm not sure it's worth it. What should
> we
> do when we fail to copy it, fe? Load from the original location? Not
> load
> the module at all? Either way out is wrong, for a certain value of
> right.
> 
> For reference, the only bug I found in glibc/BUGS with any connection to
> dlfcn is this one::
> 
> Severity: [  *] to [***]
> 
> [ **]  Closing shared objects in statically linked binaries most of the
>        times leads to crashes during the dlopen().  Hard to fix.
> 
> Since nagios isn't compiled statically, this doesn't apply, and it
> doesn't
> crash in dlopen(), but rather when running functions in the file.
> 



Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list