[PATCH] nebmods: Stop with the move-nebmod nonsense

Andreas Ericsson ae at op5.se
Wed Jun 12 16:45:06 CEST 2013


Applied, with a cautionary note to module maintainers on how they can
change their makefiles to make test-from-sources work without pains.

Thanks. I've hated that code (or rather; It's effects on gdb) a long
time now but I was far too lazy to do anything about it.

On 2013-06-05 13:28, Robin Sonefors wrote:
> The explanation for why we did this was thus:
>
>> The problem with loaded modules is that if you overwrite the original
>> file (e.g. using 'mv'), you do not alter the inode of the original
>> file.
>
> Correct.
>
>> Since the original file/module is memory-mapped in some fashion,
>> Nagios will segfault the next time an event broker call is directed to
>> one of the module's callback functions.
>
> Of course it won't. Why not? The first quoted sentence explained it
> perfectly - the same inode still points to the same fil, so putting
> another inode in the original path won't make one bit of difference.
>
> What would cause a segfault, though, is truncating the module file and
> filling it with new content so the addresses of symbols change while the
> inode stays the same. You shouldn't do that.
>
>> This is extremely problematic when it comes to upgrading NEB modules
>> while Nagios is running.
>
> To sum up the above: no, it isn't.
>
> In addition to having a bogus reason for existing in the first place,
> the code itself is problematic: it tends to cause an accumulation of
> weird and nonsensical nebmod files in your var directory over time,
> while also breaking any sort of clever debuginfo scheme employed by most
> linux distros today, because the debuginfo can't be tracked down when we
> invent new, faked names literally only in order to confuse linkers.
>
> Signed-off-by: Robin Sonefors <robin.sonefors at op5.com>
> ---
>   base/nebmods.c | 58 ++--------------------------------------------------------
>   1 file changed, 2 insertions(+), 56 deletions(-)
>
> diff --git a/base/nebmods.c b/base/nebmods.c
> index 3d84675..f6489f8 100644
> --- a/base/nebmods.c
> +++ b/base/nebmods.c
> @@ -163,14 +163,10 @@ int neb_load_all_modules(void) {
>   	}
>
>
> -#ifndef PATH_MAX
> -# define PATH_MAX 4096
> -#endif
>   /* load a particular module */
>   int neb_load_module(nebmodule *mod) {
>   	int (*initfunc)(int, char *, void *);
>   	int *module_version_ptr = NULL;
> -	char output_file[PATH_MAX];
>   	int dest_fd, result = OK;
>
>   	if(mod == NULL)
> @@ -184,36 +180,8 @@ int neb_load_module(nebmodule *mod) {
>   	if(mod->should_be_loaded == FALSE || mod->filename == NULL)
>   		return ERROR;
>
> -	/**********
> -	   Using dlopen() is great, but a real danger as-is.  The problem with loaded modules is that if you overwrite the original file (e.g. using 'mv'),
> -	   you do not alter the inode of the original file.  Since the original file/module is memory-mapped in some fashion, Nagios will segfault the next
> -	   time an event broker call is directed to one of the module's callback functions.  This is extremely problematic when it comes to upgrading NEB
> -	   modules while Nagios is running.  A workaround is to (1) 'mv' the original/loaded module file to another name (on the same filesystem)
> -	   and (2) copy the new module file to the location of the original one (using the original filename).  In this scenario, dlopen() will keep referencing
> -	   the original file/inode for callbacks.  This is not an ideal solution.   A better one is to delete the module file once it is loaded by dlopen().
> -	   This prevents other processed from unintentially overwriting the original file, which would cause Nagios to crash.  However, if we delete the file
> -	   before anyone else can muck with it, things should be good.  'lsof' shows that a deleted file is still referenced by the kernel and callback
> -	   functions continue to work once the module has been loaded.  Long story, but this took quite a while to figure out, as there isn't much
> -	   of anything I could find on the subject other than some sketchy info on similar problems on HP-UX.  Hopefully this will save future coders some time.
> -	   So... the trick is to (1) copy the module to a temp file, (2) dlopen() the temp file, and (3) immediately delete the temp file.
> -	************/
> -
> -	/*
> -	 * open a temp file for copying the module. We use my_fdcopy() so
> -	 * we re-use the destination file descriptor returned by mkstemp(3),
> -	 * which we have to close ourselves.
> -	 */
> -	snprintf(output_file, sizeof(output_file) - 1, "%s/nebmodXXXXXX", temp_path);
> -	dest_fd = mkstemp(output_file);
> -	result = my_fdcopy(mod->filename, output_file, dest_fd);
> -	close(dest_fd);
> -	if(result == ERROR) {
> -		logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Failed to safely copy module '%s'. The module will not be loaded\n", mod->filename);
> -		return ERROR;
> -		}
> -
> -	/* load the module (use the temp copy we just made) */
> -	mod->module_handle = dlopen(output_file, RTLD_NOW | RTLD_GLOBAL);
> +	/* load the module */
> +	mod->module_handle = dlopen(mod->filename, RTLD_NOW | RTLD_GLOBAL);
>   	if(mod->module_handle == NULL) {
>   		logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Could not load module '%s' -> %s\n", mod->filename, dlerror());
>
> @@ -223,28 +191,6 @@ int neb_load_module(nebmodule *mod) {
>   	/* mark the module as being loaded */
>   	mod->is_currently_loaded = TRUE;
>
> -	/* delete the temp copy of the module we just created and loaded */
> -	/* this will prevent other processes from overwriting the file (using the same inode), which would cause Nagios to crash */
> -	/* the kernel will keep the deleted file in memory until we unload it */
> -	/* NOTE: This *should* be portable to most Unices, but I've only tested it on Linux */
> -	if(unlink(output_file) == -1) {
> -		logit(NSLOG_RUNTIME_ERROR, TRUE, "Error: Could not delete temporary file '%s' used for module '%s'.  The module will be unloaded: %s\n", output_file, mod->filename, strerror(errno));
> -		neb_unload_module(mod, NEBMODULE_FORCE_UNLOAD, NEBMODULE_ERROR_API_VERSION);
> -
> -		return ERROR;
> -		}
> -
> -	/*
> -	 * now that it's loaded and removed, we create a new file in
> -	 * its place so debuggers can find the correct symbols properly,
> -	 * but only if we're supposed to dump core
> -	 */
> -	if(daemon_dumps_core == TRUE) {
> -		dest_fd = open(output_file, O_CREAT | O_WRONLY, S_IRWXU | S_IRGRP | S_IROTH);
> -		result = my_fdcopy(mod->filename, output_file, dest_fd);
> -		mod->dl_file = strdup(output_file);
> -		}
> -
>   	/* find module API version */
>   	module_version_ptr = (int *)dlsym(mod->module_handle, "__neb_api_version");
>
>


-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev




More information about the Developers mailing list