FreeBSD thread issues

Andreas Ericsson ae at op5.se
Tue Aug 23 11:04:48 CEST 2005


Christophe Yayon wrote:
> Hi again,
> 
> After some discussions on freebsd-hackers mailling list, here is a resume :
> 
> 1. There a recommendation (or a suggestion) for what to do after a fork() :
> http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html
> "In other words "It is suggested that programs that use fork() call an
> exec function very soon afterwards in the child process, thus resetting
> all states. In the meantime, only a short list of async-signal-safe
> library routines are promised to be available."
> Note *suggested*. This is a recommendation to protect against a shoddy
> pthread-implementation. The thread specifications rule that only the
> thread calling fork() is duplicated, which initially leads to the
> recommendation (other threads holding locks aren't around to release them
> in the new execution context).
> 
> 
> 2. it appears that Nagios do after a fork :
> in base/util.c:
>         (1) Become the process group leader by calling setpgid(0, 0);
>         (2) something called set_all_macro_environemt_vars(TRUE).
>             This calls snprintf a bunch, as well as set variables
>             by saving them to malloced memory.  This save is done
>             with strcpy and strcat.  setenv is then called to try to
>             export them.  memory is then freed with free(3).
>         (3) All signal handlers are reset
>         (4) The right part of the pipe is closed
>         (5) sigalarm handler is created and an alarm set.
>         (6) Checks to see if it executing an embedded perl script,
>             then tries to execute it if so.  This has the feel of
>             being too much after the fork.
>         (7) Calls popen on the command if not.
>         (8) Reads the output of the command using fgets.
>         (9) closes the other end of the pipe
>         (10) unsets all env vars.
>         (11) Calls _exit()
> 
> in base/checks.c
>         (1) set_all_macro_environment_vars(TRUE)
>         (2) forks again
>         (3) granchild:
>                 resets handler, setpgid, etc.
>                 if perl script, do embedded perl, otherwise popen.
>                 lots of read/write to pipe.
> 
> likewise in base/commands.c fork is also called for similar things.
> There's other places that also call popen...
> 
> 
> 3. You can only execute async-signal-safe functions after a fork()
> from a threaded application.  free(), malloc(), popen(), fgets(),
> are not async-signal-safe.

In a proper implementation they are. Read malloc/malloc.c from 
glibc-2.3.5 and you'll see. The first line of it reads

"/* Malloc implementation for multiple threads without lock contention"

fgets() must also be async-safe, since it's passed its storage-buffer 
from the calling function. It can contain races if several threads (or 
programs for that matter) tries to read FIFO's at the same time or are 
trying to store things to the same piece of memory, but that's neither 
new, strange or in any way non-obvious. Obviously, fgets() relies on 
lower-level IO code which must be thread-safe (read() in this case) on 
account of them being syscalls inside multitasking kernels.

popen() forks and calls execve immediately. If this isn't thread-safe 
then there's no way of executing external programs in multithreaded 
applications short of implementing popen() directly (which isn't exactly 
difficult, but still).

>  The list of async-signal-safe functions
> are here: http://www.opengroup.org/onlinepubs/009695399/nframe.html
> The restriction on fork() is here (20th bullet down):
> http://www.opengroup.org/onlinepubs/009695399/nframe.html
> 

Both of those links point to the same document, which is just the 
frameset for the navigation-frames.

For async-safe functions, this is the proper url;
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html#tag_02_09_01

For the fork() specification, the doc is here;
http://www.opengroup.org/onlinepubs/009695399/functions/fork.html

The 20'th bullet is this;
-----------
"A process shall be created with a single thread. If a multi-threaded 
process calls fork(), the new process shall contain a replica of the 
calling thread and its entire address space, possibly including the 
states of mutexes and other resources. Consequently, to avoid errors, 
the child process may only execute async-signal-safe operations until 
such time as one of the exec functions is called. [THR] [Option Start] 
Fork handlers may be established by means of the pthread_atfork() 
function in order to maintain application invariants across fork() 
calls. [Option End]

When the application calls fork() from a signal handler and any of the 
fork handlers registered by pthread_atfork() calls a function that is 
not asynch-signal-safe, the behavior is undefined."
-----------

Also note that "From the application's perspective, a fork() call should 
appear atomic." which implicitly states fork() as an async-safe function 
although the following execution may not be. It also warns that improper 
implementations makes it less so.


> 
> 4. Some FreeBSD developpers think that handling fork() in libpthread (and
> probably libthr) than was done in libc_r.  We thought it better not to try
> and reinitialize libpthread (and to some extent libc) because
> it is messy and to expose non-portable applications.
> 

This is funny, because nagios apparently runs properly on Linux, HPUX, 
Solaris, Irix, AIX and Tru64. To me that seems to indicate that Nagios 
is very portable indeed and that the BSD fellows somehow botched it. I 
might be wrong, but...

> 
> 
> Possibles solutions :
> 
> a. (the best, i think) Trying to modify Nagios code to respect the
> recommendation (1.). We are talking about portability and not
> performance...
> 

This would involve a fairly large change in the way things are done. I 
for one am all for implementing a different parallelisation mechanism 
but I'm fairly certain Ethan won't be too thrilled if I rewrite 40% of 
the code that's currently the Nagios core.

> b. a possible workaround for Nagios FreeBSD (and i think other Unix
> systems, except Linux) is to use another threads library. For FreeBSD it
> seems that uising GNU/pth (which is in the ports) seems to completely
> resolve the problem (but i think it's ugly to have to use another -not
> native- threads lib...).
> 
> 
> 
> What do you think about this ?


In summary; Some thread-libraries work while others don't (the native 
*BSD one being the only one that doesn't), I'd say it's time to fix that 
thread-library, although I favor the rewrite-nagios approach as an 
exercise in intellectual masturbation and would be quite willing to do 
the actual work of it, provided I can be somewhat sure it isn't wasted.

> Sorry for my english (i am french...)
> 

Your english is far better than most native english speakers I've come 
across.

> 
> PS : thanks to all freebsd-hackers posters which permit to resume the
> problem (Warner Losh, Daniel Eischen, Alexey Vesnin).
> 
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf




More information about the Developers mailing list