nagios 3.2.3 localtime deadlock

Andreas Ericsson ae at op5.se
Fri Oct 8 12:38:00 CEST 2010


On 10/08/2010 07:44 AM, Thomas Guyot-Sionnest wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 10-10-07 02:43 PM, Matthew Kent wrote:
>> Hello all,
>>
>> Setting up a new nagios 3.2.3 install and occasionally (once in 24
>> hours) I'm seeing a child deadlock when calling localtime() like so:
>>
>> (gdb) bt
>> #0  0x00000033d5edfade in __lll_lock_wait_private () from /lib64/libc.so.6
>> #1  0x00000033d5e8d1cd in _L_lock_1685 () from /lib64/libc.so.6
>> #2  0x00000033d5e8cf17 in __tz_convert () from /lib64/libc.so.6
>> #3  0x000000000043e23e in get_datetime_string (raw_time=<value
>> optimized out>, buffer=0x2aaab014feb0<incomplete sequence \350>,
>> buffer_length=48, type=0) at utils.c:1696
>> #4  0x0000000000430990 in grab_datetime_macro (macro_type=7, arg1=0x0,
>> arg2=0x0, output=0x6998f8) at ../common/macros.c:1533
>> #5  0x0000000000432cbf in grab_macrox_value (macro_type=-4, arg1=0x0,
>> arg2=0x0, output=0x6998f8, free_macro=0x2) at ../common/macros.c:1089
>> #6  0x0000000000433586 in set_macrox_environment_vars (set=1) at
>> ../common/macros.c:3166
>> #7  0x00000000004335bb in set_all_macro_environment_vars (set=1) at
>> ../common/macros.c:3134
>> #8  0x000000000041b4c3 in run_async_service_check (svc=0x8d62560,
>> check_options=<value optimized out>, latency=<value optimized out>,
>> scheduled_check=1, reschedule_check=1,
>>      time_is_valid=<value optimized out>, preferred_time=<value
>> optimized out>) at checks.c:658
>> #9  0x000000000041d56d in run_scheduled_service_check (svc=0x8d62560,
>> check_options=0, latency=0.68999999999999995) at checks.c:260
>> #10 0x000000000042a45a in handle_timed_event (event=0x2aaab011af30) at
>> events.c:1257
>> #11 0x000000000042abe6 in event_execution_loop () at events.c:1143
>> #12 0x0000000000413055 in main (argc=<value optimized out>,
>> argv=<value optimized out>, env=0x7fffa0670758) at nagios.c:850
>>
>> this leads to Nagios being completely frozen until I manually kill the child.
>>
>> Some light Googling tells me this can happen with localtime in certain
>> cases, but I see no indication of other people with this issue in
>> Nagios.
>>
>> It's a pretty standard Nagios install on CentOS 5.5 - except for the
>> fact I'm using the mk-livestatus event broker. We have a couple
>> thousand checks configured on a pretty aggressive interval.
>>
>> Anyone seen this before?
> 
> I'm far from being expert in threading and locking, but afaik
> localtime(), located at utils.c:1696, like other similar time functions,
> is not thread safe. I'm wondering it using the _r versions would help...
> 

It's not threadsafe because two concurrent calls will modify the same
struct, so if two threads attempt to get a time representation á la
struct tm, they may overwrite each others data and get the time fscked
up. In Nagios, that's not a huge issue and really shouldn't matter for
scheduling since we use timestamps for those, and getting the current
timestamp is an atomic (and thus implicitly threadsafe) operation.

> At first glance it seems we might have quite some code to change in
> order to be 100% thread-safe:
> 

We only need to change the places where we don't get "now" as a struct
tm from either of the functions, since it doesn't matter if they start
overwriting each other.

> $ grep -RE '(asctime|ctime|gmtime|localtime)[[:space:]]*\(' base/|wc -l
> 77
> 
> Although not all invocations are necessarily in threaded code. Anyone
> more experienced could confirm if this is the actual issue?
> 

It would be far better to remove threading from Nagios altogether and
use worker daemons to perform the actual checks, but that's a much
larger change.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb




More information about the Developers mailing list