[naemon-dev] Naemon Livestatus update

Eron Nicholson eron at basecamp.com
Mon Mar 31 21:26:42 CEST 2014


Anton,
  We have produced a core dump.  I can share it privately with you and
other naemon devs.  Let me know how you would like me to send it to
you.

It is certainly possible to send passive check results in via
livestatus.  You just have to use the nagios
PROCESS_SERVICE_CHECK_RESULT command, like :

COMMAND [1396293019] PROCESS_SERVICE_CHECK_RESULT;host-01;Service Name;0;OK:  0

Thanks,

Eron Nicholson
Systems Administrator | Basecamp

On Mon, Mar 31, 2014 at 1:14 PM, Anton Löfgren <alofgren at op5.com> wrote:
> The easiest way to track down what causes the segfault would of course be a
> core dump or a gdb back trace or similar. Is that something you would be
> able to share?
>
> Aside from that, how exactly are you submitting passive check results via
> livestatus? Is that even possible?
>
> On 31 Mar 2014 18:49, "Eron Nicholson" <eron at basecamp.com> wrote:
>>
>> Hey all,
>>   Thanks for the responses and the info.  I appreciate that you guys
>> are responsive to these issues.  I also posted this to the check_mk
>> users list and haven't gotten any response yet (see
>>
>> http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html).
>>
>> Since we are looking to use both Naemon and Check_mk in our new
>> monitoring system, I would certainly prefer it if there was a single
>> supported livestatus version shared between the two projects.  We do
>> see some issues when trying to use the Check_MK UI with
>> naemon-livestatus, as they have added new columns :
>>
>> Primary - Livestatus error
>> Unhandled exception: 400: Table 'hosts' has no column
>> 'host_comments_with_extra_info'
>>
>> We have built our own UI and Thruk is also perfectly fine, so this
>> isn't really a big concern.  As long as the backends are compatible,
>> we should be fine with either version.
>>
>> The major issue with the current version of naemon-livestatus is that
>> it crashes after ~10 seconds in our environment.  As I mentioned
>> earlier, we have tons of passive services being sent in via livestatus
>> - both from the check_mk agent checks and our own custom checks.  If
>> it disable our custom checks, naemon-livestatus will not crash, so it
>> has something to do with the additional passive checks we are sending.
>>  I have enabled livestatus logging and debugging via :
>>
>> broker_module=/usr/lib/naemon/livestatus.o /var/cache/naemon/live
>> log_file=/var/log/naemon/livestatus.log debug=1
>>
>> And do not see any errors in the livestatus.log when the process dies.
>>  I do sometimes see segfault errors in the naemon.log :
>>
>> [1396281951] Caught SIGSEGV, shutting down...
>>
>>
>> We are very, very reliant on livestatus for both pushing in passive
>> service checks and pulling data for our UI.  So our (new) monitoring
>> system is basically unusable until we can get a livestatus that works
>> with naemon and doesn't crash.  Fortunately, we still have our nagios3
>> system up and working, so we have some time to try to figure out these
>> kinds of issues.
>>
>> I would love to help out in troubleshooting this problem.  Let me know
>> if there's a newer version of naemon-livestatus that I can try or if
>> you would like me to gather some more data on the crashes.
>>
>> Thanks,
>>
>> Eron Nicholson
>> Systems Administrator | Basecamp
>>
>>
>> On Sat, Mar 29, 2014 at 7:51 AM, Anton Löfgren <alofgren at op5.com> wrote:
>> > I don't want to derail this thread further than necessary, but I just
>> > thought I should mention that there are also a number of fixes available
>> > for
>> > the build system which I hope to get into naemon in the coming week,
>> > apart
>> > from the unicode stuff Max mentions. The upstream build system (at least
>> > what we have in the op5 fork) is a complete mess, which anyone who has
>> > been
>> > down that rabbit hole should be able to attest to.
>> >
>> > I also added a couple of test cases for said unicode stuff, which should
>> > make it easier to add new ones in the future.
>> >
>> > Anyway, is anyone talking to Kettner about this? Ideally, we'd be able
>> > to
>> > work towards a common goal. Although from what I've heard (though this
>> > may
>> > or may not be accurate), he's not particularly interested in at least
>> > some
>> > of the changes we've made.
>> >
>> > If that's not possible for whatever reason, it might be best to do as
>> > Max
>> > says, and cherry-pick whatever changes we want from upstream.
>> >
>> > To get back on thread, and reiterate: you're better off using the naemon
>> > livestatus fork with naemon.
>> >
>> > al
>> >
>> > On 29 Mar 2014 11:16, "Max Sikström" <max.sikstrom at op5.com> wrote:
>> >>
>> >> Hi!
>> >>
>> >> I've tried to keep up reading what changes had happend to livestatus
>> >> upstream. But it's quite hard to track, since livestatus is just a
>> >> subdirectory in the check_mk repository.
>> >>
>> >> As far as I can see, there are just a few new features resolved in the
>> >> upstream livestatus since the fork:
>> >> - statehist table is added
>> >> - bugfixes with the log table
>> >> - fixes with livecheck, and later removal of the livecheck
>> >>
>> >> Since log handling in livestatus is really nasty to use, because of how
>> >> just increases in memory usage (since livestatus never deallocates it's
>> >> growing buffer. Once parsed 1GB of logs, 1GB of memory is stored per
>> >> thread,
>> >> afaik), I've assumed that check_mk was the only system really used that
>> >> part.
>> >>
>> >>
>> >> I don't want to see it as naemon-livestatus is older, but just a little
>> >> bit different.
>> >>
>> >> The naemon fork of livestatus has taken a path through op5 before
>> >> ending
>> >> up as the naemon-fork.  During that time, some issues has been
>> >> resolved:
>> >> - Add sorting (and pagination) support, and some bugfixes too. (Sort:
>> >> column_name asc/desc, Offset: 80, Limit: 20)
>> >> - Regexp handles case sensitivity for unicode characters correctly
>> >> (it's
>> >> really new, so I'm not sure if it's in master yet. Just know that Anton
>> >> Löfgren/catharsis has it in a branch right now)
>> >>
>> >> In the naemon-fork, there are also a couple of bug fixes:
>> >> - Possible segfault due to races between threads when submitting
>> >> commands.
>> >> (Command processing in upstream is done in worker thread, but
>> >> naemon/nagios
>> >> isn't thread safe itself, since it doesn't use threads)
>> >>
>> >> In short: naemon-livestatus and mk-livestatus has diverged, and before
>> >> it's practical to upstream changes, it probably will be too.
>> >>
>> >>
>> >> So are there any specific features you need or bugs to resolve in
>> >> naemon-livestatus that are available in mk-livestatus? Because then,
>> >> it's
>> >> probably quite easy to just port those specific ones.
>> >>
>> >> Best regards,
>> >> Max Sikström
>> >>
>> >> On 28 Mar 2014, at 19:58, Eron Nicholson <eron at basecamp.com> wrote:
>> >>
>> >> > Hello,
>> >> >  I am attempting to use Naemon with Check_MK.  Check_MK released a
>> >> > new version of livestatus today (1.2.5i1) which supports Nagios 4.
>> >> > However, I am getting errors when trying to use it with Naemon :
>> >> >
>> >> > [1396026973] Error: Could not load module
>> >> > '/usr/lib/check_mk/livestatus.o' -> /usr/lib/check_mk/livestatus.o:
>> >> > undefined symbol: get_next_log_rotation_time
>> >> > [1396026973] Error: Failed to load module
>> >> > '/usr/lib/check_mk/livestatus.o'.
>> >> > [1396026973] Error: Module loading failed. Aborting.
>> >> >
>> >> > We have been having issues with the forked naemon version of
>> >> > livestatus crashing.  We push in a lot of passive services, and it
>> >> > seems that this is causing livestatus to crash.  The forked version
>> >> > is
>> >> > quite old.  I was wondering if there was a plan to update naemon's
>> >> > livestatus to a more recent version or if there was a plan to allow
>> >> > naemon to integrate with the latest version of livestatus.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Eron Nicholson
>> >> > Systems Administrator | Basecamp
>> >>
>> >


More information about the Naemon-dev mailing list