Event broker, dlopen(), and segfaults

Marantz, Roy Roy.Marantz at deshaw.com
Fri Oct 19 14:08:27 CEST 2007


This is usually caused by updating the contents of the file instead of
replacing it.  i.e. getting a new inode might make this safe.  
You could try write to FILE.new; mv FILE.new FILE to force the new file
to get a new inode.  This might vary by OS or even OS version.
Roy

-----Original Message-----
From: nagios-devel-bounces at lists.sourceforge.net
[mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf Of Andreas
Ericsson
Sent: Friday, October 19, 2007 3:26 AM
To: nagios at nagios.org; Nagios Developers List
Subject: Re: [Nagios-devel] Event broker, dlopen(), and segfaults

Ethan Galstad wrote:
> While doing some debugging of NDOUtils, I've noticed something bad. 
> Event broker modules like ndomod.o will cause Nagios to segfault if
they 
> are overwritten on the filesystem while they are in use.
> 
> I assume this is due to the way dlopen() deals with object files.  I
was 
> under the assumption that a complete copy of the module was kept in 
> memory once it was loaded, but perhaps its mmap()'d.
> 
> The segfault is easily reproducible every time I overwrite ndomod.o 
> while in use.  Even if the "new" version of the file doesn't differ
from 
> the old.
> 
> Anyone know more details of how this works, or better yet, how to 
> avoid/deal with it?
> 

When a program still has a descriptor to the file, the kernel retains
the
diskblocks pointed to until that descriptor is made invalid (ie,
close()'d).

I just tested this with modules though, and it doesn't work.

Tested locking the file too, and that didn't work either.

Hmm... The only way out I see is to copy the file to a different
directory
and loading it from there, but I'm not sure it's worth it. What should
we
do when we fail to copy it, fe? Load from the original location? Not
load
the module at all? Either way out is wrong, for a certain value of
right.

For reference, the only bug I found in glibc/BUGS with any connection to
dlfcn is this one::

Severity: [  *] to [***]

[ **]  Closing shared objects in statically linked binaries most of the
       times leads to crashes during the dlopen().  Hard to fix.

Since nagios isn't compiled statically, this doesn't apply, and it
doesn't
crash in dlopen(), but rather when running functions in the file.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

------------------------------------------------------------------------
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list