FreeBSD thread issues

Andreas Ericsson ae at op5.se
Fri Aug 19 11:41:54 CEST 2005


Charles Sprickman wrote:
> On Fri, 19 Aug 2005, Christophe Yayon wrote:
> 
>> First,
>>
>> Thanks Andrea for your answer, but when we read the initial post from 
>> Charles Sprickman, it appears that the problem is in Nagios and not in 
>> FreeBSD threads implementation... However, in FreeBSD-threads mailling 
>> list, i've read similar issues reports from other applications (like 
>> Java for example), and i know who's right...
> 
> 
> Oddly enough, the problem also seems to exist in 5.4, which features a 
> much improved thread library.  There were also a few posts that 
> basically said that Nagios was not following spec.  This one states it 
> most simply:
> 

 From 
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

"It is suggested that programs that use fork() call an exec function 
very soon afterwards in the child process, thus resetting all states. In 
the meantime, only a short list of async-signal-safe library routines 
are promised to be available."

Note *suggested*. This is a recommendation to protect against a shoddy 
pthread-implementation. The thread specifications rule that only the 
thread calling fork() is duplicated, which initially leads to the 
recommendation (other threads holding locks aren't around to release 
them in the new execution context).

That said, Nagios would most likely benefit greatly from a different 
means of checking things than fork()'ing twice and sending the results 
through several tiers of FIFO's. Several different methods have already 
been benchmarked. For server machines (or at least cans with a lot of 
memory and quite regularly multiple CPU's), the best way seems to be to 
create a new thread for each check to run. popen() causes a fork() and 
execve(), so that should be safe enough.

What limits this imposes I don't know, but the NPTL library in use on 
most modern linux systems today handles 10.000 threads without barfing, 
so the limit would probably be sysconf(_SC_MAX_FILES), or ulimit -n, 
which is required by posix to be at least 256. Note that half this value 
(give or take 5 or so for stdin and such) represents the number of 
checks that can run simultaneously at any given time. When one of them 
completes another can kick in.

> http://marc.theaimsgroup.com/?l=freebsd-hackers&m=112125883804481&w=2
> 
> "I don't know what Nagios does just after fork(2), it would be worth to
> check.  It appears that fork(2)ing without exec(2)ing or _exit(2)ing
> in a pthreaded program is not a "valid" behaviour, regarding to
> SUSv3 [1].  I don't want to avoid admitting there is a problem in
> FreeBSD threading library, I don't know how other OSes handle this,
> but Nagios folks should really avoid doing what is explicitely
> dissuaded in SUSv3."
> 
>> Perhaps Ethan could help me, what does he thinks about this ?
> 
> 
> I did contact him directly, and then posted a summary of the responses 
> from the -hackers list here, but have not heard anything back.  I do 
> know that there are some very friendly and helpful people on that list 
> willing to help, and probably a few Nagios users.  The Nagios port 
> maintainer for FreeBSD (blaz at si.FreeBSD.org) would also be a good contact.
> 
>> Switching to Linux is a difficult psychological step for me ;-)
> 
> 
> That was just boastful baiting, it's safe to ignore and best left on 
> Slashdot with the "BSD is dying, Netcraft confirms" childishness, it's 
> best to ignore that sort of "advocacy". :)
> 
> Charles
> 
>> Thanks.
>>
>>
>>
>> -------------------------------------------------------
>> SF.Net email is Sponsored by the Better Software Conference & EXPO
>> September 19-22, 2005 * San Francisco, CA * Development Lifecycle 
>> Practices
>> Agile & Plan-Driven Development * Managing Projects & Teams * Testing 
>> & QA
>> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
>> _______________________________________________
>> Nagios-devel mailing list
>> Nagios-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>>
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf




More information about the Developers mailing list