nagios 3.2.3 localtime deadlock

Andreas Ericsson ae at op5.se
Fri Oct 8 12:52:00 CEST 2010


On 10/07/2010 08:43 PM, Matthew Kent wrote:
> Hello all,
> 

Hey you. First of all, thanks for including a backtrace. That's really
neat.

> Setting up a new nagios 3.2.3 install and occasionally (once in 24
> hours) I'm seeing a child deadlock when calling localtime() like so:
> 
> (gdb) bt
> #0  0x00000033d5edfade in __lll_lock_wait_private () from /lib64/libc.so.6
> #1  0x00000033d5e8d1cd in _L_lock_1685 () from /lib64/libc.so.6
> #2  0x00000033d5e8cf17 in __tz_convert () from /lib64/libc.so.6
> #3  0x000000000043e23e in get_datetime_string (raw_time=<value
> optimized out>, buffer=0x2aaab014feb0<incomplete sequence \350>,
> buffer_length=48, type=0) at utils.c:1696
> #4  0x0000000000430990 in grab_datetime_macro (macro_type=7, arg1=0x0,
> arg2=0x0, output=0x6998f8) at ../common/macros.c:1533
> #5  0x0000000000432cbf in grab_macrox_value (macro_type=-4, arg1=0x0,
> arg2=0x0, output=0x6998f8, free_macro=0x2) at ../common/macros.c:1089
> #6  0x0000000000433586 in set_macrox_environment_vars (set=1) at
> ../common/macros.c:3166
> #7  0x00000000004335bb in set_all_macro_environment_vars (set=1) at
> ../common/macros.c:3134
> #8  0x000000000041b4c3 in run_async_service_check (svc=0x8d62560,
> check_options=<value optimized out>, latency=<value optimized out>,
> scheduled_check=1, reschedule_check=1,
>      time_is_valid=<value optimized out>, preferred_time=<value
> optimized out>) at checks.c:658
> #9  0x000000000041d56d in run_scheduled_service_check (svc=0x8d62560,
> check_options=0, latency=0.68999999999999995) at checks.c:260
> #10 0x000000000042a45a in handle_timed_event (event=0x2aaab011af30) at
> events.c:1257
> #11 0x000000000042abe6 in event_execution_loop () at events.c:1143
> #12 0x0000000000413055 in main (argc=<value optimized out>,
> argv=<value optimized out>, env=0x7fffa0670758) at nagios.c:850
> 
> this leads to Nagios being completely frozen until I manually kill the child.
> 


Looking at the glibc code, I see no possible way that a single thread
can hold on to the lock in __tz_convert() for any extended period of
time. What version of glibc are you using?

> Some light Googling tells me this can happen with localtime in certain
> cases, but I see no indication of other people with this issue in
> Nagios.
> 

Since this seems to happen in the codepath that exports macros as
environment variables, I'd like to know if it happens if you turn
that stuff off. Unless you really, really need it it's a good idea
to do that anyways, since computing a bazillion macros each time
Nagios runs a check is quite expensive. Set

  use_large_installation_tweaks=1
or
  enable_environment_macros=0

in your nagios.cfg file.

use_large_installation_tweaks=1 is a really good idea anyways unless
you're running Nagios on Windows 95, where a process' used memory
was never reclaimed by the system unless manually free()'d.

> It's a pretty standard Nagios install on CentOS 5.5 - except for the
> fact I'm using the mk-livestatus event broker. We have a couple
> thousand checks configured on a pretty aggressive interval.
> 

First try disabling environment macros. Then try without the
mk-livestatus module. Seeing it happen in a pristine Nagios would mean
we don't need to speculate about where the problem happens.

> Anyone seen this before?
> 

I haven't, but very few bugs only happen to a single user so it's
more than just possible.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb




More information about the Developers mailing list