nagios 3.2.3 localtime deadlock

Thomas Guyot-Sionnest dermoth at aei.ca
Fri Oct 8 07:44:23 CEST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10-10-07 02:43 PM, Matthew Kent wrote:
> Hello all,
> 
> Setting up a new nagios 3.2.3 install and occasionally (once in 24
> hours) I'm seeing a child deadlock when calling localtime() like so:
> 
> (gdb) bt
> #0  0x00000033d5edfade in __lll_lock_wait_private () from /lib64/libc.so.6
> #1  0x00000033d5e8d1cd in _L_lock_1685 () from /lib64/libc.so.6
> #2  0x00000033d5e8cf17 in __tz_convert () from /lib64/libc.so.6
> #3  0x000000000043e23e in get_datetime_string (raw_time=<value
> optimized out>, buffer=0x2aaab014feb0  <incomplete sequence \350>,
> buffer_length=48, type=0) at utils.c:1696
> #4  0x0000000000430990 in grab_datetime_macro (macro_type=7, arg1=0x0,
> arg2=0x0, output=0x6998f8) at ../common/macros.c:1533
> #5  0x0000000000432cbf in grab_macrox_value (macro_type=-4, arg1=0x0,
> arg2=0x0, output=0x6998f8, free_macro=0x2) at ../common/macros.c:1089
> #6  0x0000000000433586 in set_macrox_environment_vars (set=1) at
> ../common/macros.c:3166
> #7  0x00000000004335bb in set_all_macro_environment_vars (set=1) at
> ../common/macros.c:3134
> #8  0x000000000041b4c3 in run_async_service_check (svc=0x8d62560,
> check_options=<value optimized out>, latency=<value optimized out>,
> scheduled_check=1, reschedule_check=1,
>     time_is_valid=<value optimized out>, preferred_time=<value
> optimized out>) at checks.c:658
> #9  0x000000000041d56d in run_scheduled_service_check (svc=0x8d62560,
> check_options=0, latency=0.68999999999999995) at checks.c:260
> #10 0x000000000042a45a in handle_timed_event (event=0x2aaab011af30) at
> events.c:1257
> #11 0x000000000042abe6 in event_execution_loop () at events.c:1143
> #12 0x0000000000413055 in main (argc=<value optimized out>,
> argv=<value optimized out>, env=0x7fffa0670758) at nagios.c:850
> 
> this leads to Nagios being completely frozen until I manually kill the child.
> 
> Some light Googling tells me this can happen with localtime in certain
> cases, but I see no indication of other people with this issue in
> Nagios.
> 
> It's a pretty standard Nagios install on CentOS 5.5 - except for the
> fact I'm using the mk-livestatus event broker. We have a couple
> thousand checks configured on a pretty aggressive interval.
> 
> Anyone seen this before?

I'm far from being expert in threading and locking, but afaik
localtime(), located at utils.c:1696, like other similar time functions,
is not thread safe. I'm wondering it using the _r versions would help...

At first glance it seems we might have quite some code to change in
order to be 100% thread-safe:

$ grep -RE '(asctime|ctime|gmtime|localtime)[[:space:]]*\(' base/|wc -l
77

Although not all invocations are necessarily in threaded code. Anyone
more experienced could confirm if this is the actual issue?


> ------------------------------------------------------------------------------
> Beautiful is writing same markup. Internet Explorer 9 supports [..]

Aieee!, so much IE spam... can't we just blacklist these words :p

- -- 
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyur7cACgkQ6dZ+Kt5BchZZCQCg81gyt10qxUlU3t4l8RJb5lFk
WKUAmgKavQkc/0V7GmNZzKCP/Do3cFOQ
=lUlS
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb




More information about the Developers mailing list