nagios 3.2.3 localtime deadlock

Matthew Kent real.mkent at gmail.com
Tue Oct 12 19:44:23 CEST 2010


On Fri, Oct 8, 2010 at 9:08 AM, Matthew Kent <real.mkent at gmail.com> wrote:
> On Fri, Oct 8, 2010 at 3:52 AM, Andreas Ericsson <ae at op5.se> wrote:
>> On 10/07/2010 08:43 PM, Matthew Kent wrote:
>>> Hello all,
>>>
>>
>> Hey you. First of all, thanks for including a backtrace. That's really
>> neat.
>>
>
> Thanks for looking :)
>
>>> Setting up a new nagios 3.2.3 install and occasionally (once in 24
>>> hours) I'm seeing a child deadlock when calling localtime() like so:
>>>
>>> (gdb) bt
>>> #0  0x00000033d5edfade in __lll_lock_wait_private () from /lib64/libc.so.6
>>> #1  0x00000033d5e8d1cd in _L_lock_1685 () from /lib64/libc.so.6
>>> #2  0x00000033d5e8cf17 in __tz_convert () from /lib64/libc.so.6
>>> #3  0x000000000043e23e in get_datetime_string (raw_time=<value
>>> optimized out>, buffer=0x2aaab014feb0<incomplete sequence \350>,
>>> buffer_length=48, type=0) at utils.c:1696
>>> #4  0x0000000000430990 in grab_datetime_macro (macro_type=7, arg1=0x0,
>>> arg2=0x0, output=0x6998f8) at ../common/macros.c:1533
>>> #5  0x0000000000432cbf in grab_macrox_value (macro_type=-4, arg1=0x0,
>>> arg2=0x0, output=0x6998f8, free_macro=0x2) at ../common/macros.c:1089
>>> #6  0x0000000000433586 in set_macrox_environment_vars (set=1) at
>>> ../common/macros.c:3166
>>> #7  0x00000000004335bb in set_all_macro_environment_vars (set=1) at
>>> ../common/macros.c:3134
>>> #8  0x000000000041b4c3 in run_async_service_check (svc=0x8d62560,
>>> check_options=<value optimized out>, latency=<value optimized out>,
>>> scheduled_check=1, reschedule_check=1,
>>>      time_is_valid=<value optimized out>, preferred_time=<value
>>> optimized out>) at checks.c:658
>>> #9  0x000000000041d56d in run_scheduled_service_check (svc=0x8d62560,
>>> check_options=0, latency=0.68999999999999995) at checks.c:260
>>> #10 0x000000000042a45a in handle_timed_event (event=0x2aaab011af30) at
>>> events.c:1257
>>> #11 0x000000000042abe6 in event_execution_loop () at events.c:1143
>>> #12 0x0000000000413055 in main (argc=<value optimized out>,
>>> argv=<value optimized out>, env=0x7fffa0670758) at nagios.c:850
>>>
>>> this leads to Nagios being completely frozen until I manually kill the child.
>>>
>>
>>
>> Looking at the glibc code, I see no possible way that a single thread
>> can hold on to the lock in __tz_convert() for any extended period of
>> time. What version of glibc are you using?
>>
>
> glibc-2.5-49.el5_5.4.x86_64
>
>>> Some light Googling tells me this can happen with localtime in certain
>>> cases, but I see no indication of other people with this issue in
>>> Nagios.
>>>
>>
>> Since this seems to happen in the codepath that exports macros as
>> environment variables, I'd like to know if it happens if you turn
>> that stuff off. Unless you really, really need it it's a good idea
>> to do that anyways, since computing a bazillion macros each time
>> Nagios runs a check is quite expensive. Set
>>
>>  use_large_installation_tweaks=1
>> or
>>  enable_environment_macros=0
>>
>> in your nagios.cfg file.
>>
>> use_large_installation_tweaks=1 is a really good idea anyways unless
>> you're running Nagios on Windows 95, where a process' used memory
>> was never reclaimed by the system unless manually free()'d.
>>
>
> Yeah we don't even use the environment variables. Thanks for all the info.
>
>>> It's a pretty standard Nagios install on CentOS 5.5 - except for the
>>> fact I'm using the mk-livestatus event broker. We have a couple
>>> thousand checks configured on a pretty aggressive interval.
>>>
>>
>> First try disabling environment macros. Then try without the
>> mk-livestatus module. Seeing it happen in a pristine Nagios would mean
>> we don't need to speculate about where the problem happens.
>
> Good call, I'll disable the env macros and run it over the weekend,
> then reenable them and with livestatus off for good measure and report
> back here. We'll see what happens!
>

Oops, this was originally supposed to go to the list.

For the record setting

enable_environment_macros=0

did indeed prevent the issue from reoccurring over a period 72 hours.

Let me know if I can be of further assistance with this issue.

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb




More information about the Developers mailing list