[naemon-dev] Naemon Livestatus update

Robin Sonefors ozamosi at flukkost.nu
Tue Apr 1 09:40:38 CEST 2014


And, to actually include the important part of what I tried to say 
yesterday: if you don't want to send private data to random yahoos on a 
public mailing list, we have a private mailing list that can be used for 
sensitive info: naemon-team at monitoring-lists.org

On 2014-03-31 21:26, Eron Nicholson wrote:
> Anton,
>    We have produced a core dump.  I can share it privately with you and
> other naemon devs.  Let me know how you would like me to send it to
> you.
>
> It is certainly possible to send passive check results in via
> livestatus.  You just have to use the nagios
> PROCESS_SERVICE_CHECK_RESULT command, like :
>
> COMMAND [1396293019] PROCESS_SERVICE_CHECK_RESULT;host-01;Service Name;0;OK:  0
>
> Thanks,
>
> Eron Nicholson
> Systems Administrator | Basecamp
>
> On Mon, Mar 31, 2014 at 1:14 PM, Anton Löfgren <alofgren at op5.com> wrote:
>> The easiest way to track down what causes the segfault would of course be a
>> core dump or a gdb back trace or similar. Is that something you would be
>> able to share?
>>
>> Aside from that, how exactly are you submitting passive check results via
>> livestatus? Is that even possible?
>>
>> On 31 Mar 2014 18:49, "Eron Nicholson" <eron at basecamp.com> wrote:
>>>
>>> Hey all,
>>>    Thanks for the responses and the info.  I appreciate that you guys
>>> are responsive to these issues.  I also posted this to the check_mk
>>> users list and haven't gotten any response yet (see
>>>
>>> http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html).
>>>
>>> Since we are looking to use both Naemon and Check_mk in our new
>>> monitoring system, I would certainly prefer it if there was a single
>>> supported livestatus version shared between the two projects.  We do
>>> see some issues when trying to use the Check_MK UI with
>>> naemon-livestatus, as they have added new columns :
>>>
>>> Primary - Livestatus error
>>> Unhandled exception: 400: Table 'hosts' has no column
>>> 'host_comments_with_extra_info'
>>>
>>> We have built our own UI and Thruk is also perfectly fine, so this
>>> isn't really a big concern.  As long as the backends are compatible,
>>> we should be fine with either version.
>>>
>>> The major issue with the current version of naemon-livestatus is that
>>> it crashes after ~10 seconds in our environment.  As I mentioned
>>> earlier, we have tons of passive services being sent in via livestatus
>>> - both from the check_mk agent checks and our own custom checks.  If
>>> it disable our custom checks, naemon-livestatus will not crash, so it
>>> has something to do with the additional passive checks we are sending.
>>>   I have enabled livestatus logging and debugging via :
>>>
>>> broker_module=/usr/lib/naemon/livestatus.o /var/cache/naemon/live
>>> log_file=/var/log/naemon/livestatus.log debug=1
>>>
>>> And do not see any errors in the livestatus.log when the process dies.
>>>   I do sometimes see segfault errors in the naemon.log :
>>>
>>> [1396281951] Caught SIGSEGV, shutting down...
>>>
>>>
>>> We are very, very reliant on livestatus for both pushing in passive
>>> service checks and pulling data for our UI.  So our (new) monitoring
>>> system is basically unusable until we can get a livestatus that works
>>> with naemon and doesn't crash.  Fortunately, we still have our nagios3
>>> system up and working, so we have some time to try to figure out these
>>> kinds of issues.
>>>
>>> I would love to help out in troubleshooting this problem.  Let me know
>>> if there's a newer version of naemon-livestatus that I can try or if
>>> you would like me to gather some more data on the crashes.
>>>
>>> Thanks,
>>>
>>> Eron Nicholson
>>> Systems Administrator | Basecamp
>>>
>>>
>>> On Sat, Mar 29, 2014 at 7:51 AM, Anton Löfgren <alofgren at op5.com> wrote:
>>>> I don't want to derail this thread further than necessary, but I just
>>>> thought I should mention that there are also a number of fixes available
>>>> for
>>>> the build system which I hope to get into naemon in the coming week,
>>>> apart
>>>> from the unicode stuff Max mentions. The upstream build system (at least
>>>> what we have in the op5 fork) is a complete mess, which anyone who has
>>>> been
>>>> down that rabbit hole should be able to attest to.
>>>>
>>>> I also added a couple of test cases for said unicode stuff, which should
>>>> make it easier to add new ones in the future.
>>>>
>>>> Anyway, is anyone talking to Kettner about this? Ideally, we'd be able
>>>> to
>>>> work towards a common goal. Although from what I've heard (though this
>>>> may
>>>> or may not be accurate), he's not particularly interested in at least
>>>> some
>>>> of the changes we've made.
>>>>
>>>> If that's not possible for whatever reason, it might be best to do as
>>>> Max
>>>> says, and cherry-pick whatever changes we want from upstream.
>>>>
>>>> To get back on thread, and reiterate: you're better off using the naemon
>>>> livestatus fork with naemon.
>>>>
>>>> al
>>>>
>>>> On 29 Mar 2014 11:16, "Max Sikström" <max.sikstrom at op5.com> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> I've tried to keep up reading what changes had happend to livestatus
>>>>> upstream. But it's quite hard to track, since livestatus is just a
>>>>> subdirectory in the check_mk repository.
>>>>>
>>>>> As far as I can see, there are just a few new features resolved in the
>>>>> upstream livestatus since the fork:
>>>>> - statehist table is added
>>>>> - bugfixes with the log table
>>>>> - fixes with livecheck, and later removal of the livecheck
>>>>>
>>>>> Since log handling in livestatus is really nasty to use, because of how
>>>>> just increases in memory usage (since livestatus never deallocates it's
>>>>> growing buffer. Once parsed 1GB of logs, 1GB of memory is stored per
>>>>> thread,
>>>>> afaik), I've assumed that check_mk was the only system really used that
>>>>> part.
>>>>>
>>>>>
>>>>> I don't want to see it as naemon-livestatus is older, but just a little
>>>>> bit different.
>>>>>
>>>>> The naemon fork of livestatus has taken a path through op5 before
>>>>> ending
>>>>> up as the naemon-fork.  During that time, some issues has been
>>>>> resolved:
>>>>> - Add sorting (and pagination) support, and some bugfixes too. (Sort:
>>>>> column_name asc/desc, Offset: 80, Limit: 20)
>>>>> - Regexp handles case sensitivity for unicode characters correctly
>>>>> (it's
>>>>> really new, so I'm not sure if it's in master yet. Just know that Anton
>>>>> Löfgren/catharsis has it in a branch right now)
>>>>>
>>>>> In the naemon-fork, there are also a couple of bug fixes:
>>>>> - Possible segfault due to races between threads when submitting
>>>>> commands.
>>>>> (Command processing in upstream is done in worker thread, but
>>>>> naemon/nagios
>>>>> isn't thread safe itself, since it doesn't use threads)
>>>>>
>>>>> In short: naemon-livestatus and mk-livestatus has diverged, and before
>>>>> it's practical to upstream changes, it probably will be too.
>>>>>
>>>>>
>>>>> So are there any specific features you need or bugs to resolve in
>>>>> naemon-livestatus that are available in mk-livestatus? Because then,
>>>>> it's
>>>>> probably quite easy to just port those specific ones.
>>>>>
>>>>> Best regards,
>>>>> Max Sikström
>>>>>
>>>>> On 28 Mar 2014, at 19:58, Eron Nicholson <eron at basecamp.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>   I am attempting to use Naemon with Check_MK.  Check_MK released a
>>>>>> new version of livestatus today (1.2.5i1) which supports Nagios 4.
>>>>>> However, I am getting errors when trying to use it with Naemon :
>>>>>>
>>>>>> [1396026973] Error: Could not load module
>>>>>> '/usr/lib/check_mk/livestatus.o' -> /usr/lib/check_mk/livestatus.o:
>>>>>> undefined symbol: get_next_log_rotation_time
>>>>>> [1396026973] Error: Failed to load module
>>>>>> '/usr/lib/check_mk/livestatus.o'.
>>>>>> [1396026973] Error: Module loading failed. Aborting.
>>>>>>
>>>>>> We have been having issues with the forked naemon version of
>>>>>> livestatus crashing.  We push in a lot of passive services, and it
>>>>>> seems that this is causing livestatus to crash.  The forked version
>>>>>> is
>>>>>> quite old.  I was wondering if there was a plan to update naemon's
>>>>>> livestatus to a more recent version or if there was a plan to allow
>>>>>> naemon to integrate with the latest version of livestatus.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Eron Nicholson
>>>>>> Systems Administrator | Basecamp
>>>>>
>>>>


More information about the Naemon-dev mailing list