[naemon-dev] Naemon Livestatus update

Eron Nicholson eron at basecamp.com
Fri Apr 4 16:23:13 CEST 2014


Robin,
  I went ahead and built a latest-git version livestatus and have been
running it without crashes for a few days.  I was previously running
the version packaged with 0.8 from about a month ago.  Hopefully it
will prove reliable - it looks good so far.

Thanks for the help,

Eron Nicholson
Systems Administrator | Basecamp

On Tue, Apr 1, 2014 at 3:40 AM, Robin Sonefors <ozamosi at flukkost.nu> wrote:
> And, to actually include the important part of what I tried to say
> yesterday: if you don't want to send private data to random yahoos on a
> public mailing list, we have a private mailing list that can be used for
> sensitive info: naemon-team at monitoring-lists.org
>
>
> On 2014-03-31 21:26, Eron Nicholson wrote:
>>
>> Anton,
>>    We have produced a core dump.  I can share it privately with you and
>> other naemon devs.  Let me know how you would like me to send it to
>> you.
>>
>> It is certainly possible to send passive check results in via
>> livestatus.  You just have to use the nagios
>> PROCESS_SERVICE_CHECK_RESULT command, like :
>>
>> COMMAND [1396293019] PROCESS_SERVICE_CHECK_RESULT;host-01;Service
>> Name;0;OK:  0
>>
>> Thanks,
>>
>> Eron Nicholson
>> Systems Administrator | Basecamp
>>
>> On Mon, Mar 31, 2014 at 1:14 PM, Anton Löfgren <alofgren at op5.com> wrote:
>>>
>>> The easiest way to track down what causes the segfault would of course be
>>> a
>>> core dump or a gdb back trace or similar. Is that something you would be
>>> able to share?
>>>
>>> Aside from that, how exactly are you submitting passive check results via
>>> livestatus? Is that even possible?
>>>
>>> On 31 Mar 2014 18:49, "Eron Nicholson" <eron at basecamp.com> wrote:
>>>>
>>>>
>>>> Hey all,
>>>>    Thanks for the responses and the info.  I appreciate that you guys
>>>> are responsive to these issues.  I also posted this to the check_mk
>>>> users list and haven't gotten any response yet (see
>>>>
>>>>
>>>> http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html).
>>>>
>>>> Since we are looking to use both Naemon and Check_mk in our new
>>>> monitoring system, I would certainly prefer it if there was a single
>>>> supported livestatus version shared between the two projects.  We do
>>>> see some issues when trying to use the Check_MK UI with
>>>> naemon-livestatus, as they have added new columns :
>>>>
>>>> Primary - Livestatus error
>>>> Unhandled exception: 400: Table 'hosts' has no column
>>>> 'host_comments_with_extra_info'
>>>>
>>>> We have built our own UI and Thruk is also perfectly fine, so this
>>>> isn't really a big concern.  As long as the backends are compatible,
>>>> we should be fine with either version.
>>>>
>>>> The major issue with the current version of naemon-livestatus is that
>>>> it crashes after ~10 seconds in our environment.  As I mentioned
>>>> earlier, we have tons of passive services being sent in via livestatus
>>>> - both from the check_mk agent checks and our own custom checks.  If
>>>> it disable our custom checks, naemon-livestatus will not crash, so it
>>>> has something to do with the additional passive checks we are sending.
>>>>   I have enabled livestatus logging and debugging via :
>>>>
>>>> broker_module=/usr/lib/naemon/livestatus.o /var/cache/naemon/live
>>>> log_file=/var/log/naemon/livestatus.log debug=1
>>>>
>>>> And do not see any errors in the livestatus.log when the process dies.
>>>>   I do sometimes see segfault errors in the naemon.log :
>>>>
>>>> [1396281951] Caught SIGSEGV, shutting down...
>>>>
>>>>
>>>> We are very, very reliant on livestatus for both pushing in passive
>>>> service checks and pulling data for our UI.  So our (new) monitoring
>>>> system is basically unusable until we can get a livestatus that works
>>>> with naemon and doesn't crash.  Fortunately, we still have our nagios3
>>>> system up and working, so we have some time to try to figure out these
>>>> kinds of issues.
>>>>
>>>> I would love to help out in troubleshooting this problem.  Let me know
>>>> if there's a newer version of naemon-livestatus that I can try or if
>>>> you would like me to gather some more data on the crashes.
>>>>
>>>> Thanks,
>>>>
>>>> Eron Nicholson
>>>> Systems Administrator | Basecamp
>>>>
>>>>
>>>> On Sat, Mar 29, 2014 at 7:51 AM, Anton Löfgren <alofgren at op5.com> wrote:
>>>>>
>>>>> I don't want to derail this thread further than necessary, but I just
>>>>> thought I should mention that there are also a number of fixes
>>>>> available
>>>>> for
>>>>> the build system which I hope to get into naemon in the coming week,
>>>>> apart
>>>>> from the unicode stuff Max mentions. The upstream build system (at
>>>>> least
>>>>> what we have in the op5 fork) is a complete mess, which anyone who has
>>>>> been
>>>>> down that rabbit hole should be able to attest to.
>>>>>
>>>>> I also added a couple of test cases for said unicode stuff, which
>>>>> should
>>>>> make it easier to add new ones in the future.
>>>>>
>>>>> Anyway, is anyone talking to Kettner about this? Ideally, we'd be able
>>>>> to
>>>>> work towards a common goal. Although from what I've heard (though this
>>>>> may
>>>>> or may not be accurate), he's not particularly interested in at least
>>>>> some
>>>>> of the changes we've made.
>>>>>
>>>>> If that's not possible for whatever reason, it might be best to do as
>>>>> Max
>>>>> says, and cherry-pick whatever changes we want from upstream.
>>>>>
>>>>> To get back on thread, and reiterate: you're better off using the
>>>>> naemon
>>>>> livestatus fork with naemon.
>>>>>
>>>>> al
>>>>>
>>>>> On 29 Mar 2014 11:16, "Max Sikström" <max.sikstrom at op5.com> wrote:
>>>>>>
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I've tried to keep up reading what changes had happend to livestatus
>>>>>> upstream. But it's quite hard to track, since livestatus is just a
>>>>>> subdirectory in the check_mk repository.
>>>>>>
>>>>>> As far as I can see, there are just a few new features resolved in the
>>>>>> upstream livestatus since the fork:
>>>>>> - statehist table is added
>>>>>> - bugfixes with the log table
>>>>>> - fixes with livecheck, and later removal of the livecheck
>>>>>>
>>>>>> Since log handling in livestatus is really nasty to use, because of
>>>>>> how
>>>>>> just increases in memory usage (since livestatus never deallocates
>>>>>> it's
>>>>>> growing buffer. Once parsed 1GB of logs, 1GB of memory is stored per
>>>>>> thread,
>>>>>> afaik), I've assumed that check_mk was the only system really used
>>>>>> that
>>>>>> part.
>>>>>>
>>>>>>
>>>>>> I don't want to see it as naemon-livestatus is older, but just a
>>>>>> little
>>>>>> bit different.
>>>>>>
>>>>>> The naemon fork of livestatus has taken a path through op5 before
>>>>>> ending
>>>>>> up as the naemon-fork.  During that time, some issues has been
>>>>>> resolved:
>>>>>> - Add sorting (and pagination) support, and some bugfixes too. (Sort:
>>>>>> column_name asc/desc, Offset: 80, Limit: 20)
>>>>>> - Regexp handles case sensitivity for unicode characters correctly
>>>>>> (it's
>>>>>> really new, so I'm not sure if it's in master yet. Just know that
>>>>>> Anton
>>>>>> Löfgren/catharsis has it in a branch right now)
>>>>>>
>>>>>> In the naemon-fork, there are also a couple of bug fixes:
>>>>>> - Possible segfault due to races between threads when submitting
>>>>>> commands.
>>>>>> (Command processing in upstream is done in worker thread, but
>>>>>> naemon/nagios
>>>>>> isn't thread safe itself, since it doesn't use threads)
>>>>>>
>>>>>> In short: naemon-livestatus and mk-livestatus has diverged, and before
>>>>>> it's practical to upstream changes, it probably will be too.
>>>>>>
>>>>>>
>>>>>> So are there any specific features you need or bugs to resolve in
>>>>>> naemon-livestatus that are available in mk-livestatus? Because then,
>>>>>> it's
>>>>>> probably quite easy to just port those specific ones.
>>>>>>
>>>>>> Best regards,
>>>>>> Max Sikström
>>>>>>
>>>>>> On 28 Mar 2014, at 19:58, Eron Nicholson <eron at basecamp.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>   I am attempting to use Naemon with Check_MK.  Check_MK released a
>>>>>>> new version of livestatus today (1.2.5i1) which supports Nagios 4.
>>>>>>> However, I am getting errors when trying to use it with Naemon :
>>>>>>>
>>>>>>> [1396026973] Error: Could not load module
>>>>>>> '/usr/lib/check_mk/livestatus.o' -> /usr/lib/check_mk/livestatus.o:
>>>>>>> undefined symbol: get_next_log_rotation_time
>>>>>>> [1396026973] Error: Failed to load module
>>>>>>> '/usr/lib/check_mk/livestatus.o'.
>>>>>>> [1396026973] Error: Module loading failed. Aborting.
>>>>>>>
>>>>>>> We have been having issues with the forked naemon version of
>>>>>>> livestatus crashing.  We push in a lot of passive services, and it
>>>>>>> seems that this is causing livestatus to crash.  The forked version
>>>>>>> is
>>>>>>> quite old.  I was wondering if there was a plan to update naemon's
>>>>>>> livestatus to a more recent version or if there was a plan to allow
>>>>>>> naemon to integrate with the latest version of livestatus.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Eron Nicholson
>>>>>>> Systems Administrator | Basecamp
>>>>>>
>>>>>>
>>>>>
>


More information about the Naemon-dev mailing list