nagios 3.2.3 localtime deadlock

Andreas Ericsson ae at op5.se
Sat Oct 9 12:44:41 CEST 2010


On 10/09/2010 02:05 AM, Lars Michelsen wrote:
> Hello Andreas, Hello List,
> 
>> First try disabling environment macros. Then try without the
>> mk-livestatus module. Seeing it happen in a pristine Nagios would mean
>> we don't need to speculate about where the problem happens.
>>
>>> Anyone seen this before?
>>>
> 
> Yes, happened to several users. This already has been debugged by Mathias
> Kettner (Dev of Livestatus).
> 

He emailed me separately. I'll summarize the conclusions of our discussion
here.

> The problem from the start: When nagios is executing a check it performs a
> fork(). The forked process contains only the thread which performed the fork().
> The livestatus threads are not forked.

Unless livestatus is calling any of the [a-z]*time() functions, it's orthogonal
to this discussion.

> But if something is executing localtime() which is holding the lock the lock
> will be forked with the state it is. And it will not be released.
> 
> The forked process runs into the lock while setting the environment macros and
> hangs forever waiting for the lock to be released.
> 

Nagios runs localtime() at other points than when just setting the environment
macros though. My guess so far is that if you're using $DATETIME$-ish macros in
regular checks you could also be in danger of encountering this bug. That's
fairly uncommon for checks though, and notifications solve the problem by
blocking all checks (and thus all fork()'s) and calculating the macros once
before commencing fork()ing again to send out the notifications.

> Disabling the environment macros solves the problem.
> 
> Maybe localtime_r could solve the problem?
> 

It could.

> The problem is that livestatus is not possible without threads.

Livestatus can use threads even if the core does not. It just has to link to
whatever threading library is available on its own rather than trusting the
Nagios core to do it.

> It would be nice to have the problem solved since the amount of livestatus users
> is growing.
> 

Livestatus is still orthogonal to this issue, unless Livestatus is responsible
for the call that happens to block execution. Even if it is, Nagios does plenty
of them on its own, so removing Livestatus is not guaranteed to solve the
problem, just as disabling environment macros is not guaranteed to fix it either.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb




More information about the Developers mailing list