[naemon-dev] Naemon Livestatus update

Max Sikström max.sikstrom at op5.com
Mon Mar 31 22:31:43 CEST 2014


Oh, that's a nasty column. Not hard to port and it does what it should, but
looks to me like a special case implemented for one purpose.

It shouldn't be hard to cherry-pick, since it's one simple commit (plus a
bugfix):

commit 6f2e6f830115a7790e213675b526451f034d2699

Author: Andreas Boesl <ab at mathias-kettner.de>

Date:   Thu Feb 7 11:05:24 2013 +0100

    livestatus: new comments_with_extra_info column in

    hosts/services table


// Max S


On Mon, Mar 31, 2014 at 6:49 PM, Eron Nicholson <eron at basecamp.com> wrote:

> Hey all,
>   Thanks for the responses and the info.  I appreciate that you guys
> are responsive to these issues.  I also posted this to the check_mk
> users list and haven't gotten any response yet (see
> http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-March/011881.html
> ).
>
> Since we are looking to use both Naemon and Check_mk in our new
> monitoring system, I would certainly prefer it if there was a single
> supported livestatus version shared between the two projects.  We do
> see some issues when trying to use the Check_MK UI with
> naemon-livestatus, as they have added new columns :
>
> Primary - Livestatus error
> Unhandled exception: 400: Table 'hosts' has no column
> 'host_comments_with_extra_info'
>
> We have built our own UI and Thruk is also perfectly fine, so this
> isn't really a big concern.  As long as the backends are compatible,
> we should be fine with either version.
>
> The major issue with the current version of naemon-livestatus is that
> it crashes after ~10 seconds in our environment.  As I mentioned
> earlier, we have tons of passive services being sent in via livestatus
> - both from the check_mk agent checks and our own custom checks.  If
> it disable our custom checks, naemon-livestatus will not crash, so it
> has something to do with the additional passive checks we are sending.
>  I have enabled livestatus logging and debugging via :
>
> broker_module=/usr/lib/naemon/livestatus.o /var/cache/naemon/live
> log_file=/var/log/naemon/livestatus.log debug=1
>
> And do not see any errors in the livestatus.log when the process dies.
>  I do sometimes see segfault errors in the naemon.log :
>
> [1396281951] Caught SIGSEGV, shutting down...
>
>
> We are very, very reliant on livestatus for both pushing in passive
> service checks and pulling data for our UI.  So our (new) monitoring
> system is basically unusable until we can get a livestatus that works
> with naemon and doesn't crash.  Fortunately, we still have our nagios3
> system up and working, so we have some time to try to figure out these
> kinds of issues.
>
> I would love to help out in troubleshooting this problem.  Let me know
> if there's a newer version of naemon-livestatus that I can try or if
> you would like me to gather some more data on the crashes.
>
> Thanks,
>
> Eron Nicholson
> Systems Administrator | Basecamp
>
>
> On Sat, Mar 29, 2014 at 7:51 AM, Anton Löfgren <alofgren at op5.com> wrote:
> > I don't want to derail this thread further than necessary, but I just
> > thought I should mention that there are also a number of fixes available
> for
> > the build system which I hope to get into naemon in the coming week,
> apart
> > from the unicode stuff Max mentions. The upstream build system (at least
> > what we have in the op5 fork) is a complete mess, which anyone who has
> been
> > down that rabbit hole should be able to attest to.
> >
> > I also added a couple of test cases for said unicode stuff, which should
> > make it easier to add new ones in the future.
> >
> > Anyway, is anyone talking to Kettner about this? Ideally, we'd be able to
> > work towards a common goal. Although from what I've heard (though this
> may
> > or may not be accurate), he's not particularly interested in at least
> some
> > of the changes we've made.
> >
> > If that's not possible for whatever reason, it might be best to do as Max
> > says, and cherry-pick whatever changes we want from upstream.
> >
> > To get back on thread, and reiterate: you're better off using the naemon
> > livestatus fork with naemon.
> >
> > al
> >
> > On 29 Mar 2014 11:16, "Max Sikström" <max.sikstrom at op5.com> wrote:
> >>
> >> Hi!
> >>
> >> I've tried to keep up reading what changes had happend to livestatus
> >> upstream. But it's quite hard to track, since livestatus is just a
> >> subdirectory in the check_mk repository.
> >>
> >> As far as I can see, there are just a few new features resolved in the
> >> upstream livestatus since the fork:
> >> - statehist table is added
> >> - bugfixes with the log table
> >> - fixes with livecheck, and later removal of the livecheck
> >>
> >> Since log handling in livestatus is really nasty to use, because of how
> >> just increases in memory usage (since livestatus never deallocates it's
> >> growing buffer. Once parsed 1GB of logs, 1GB of memory is stored per
> thread,
> >> afaik), I've assumed that check_mk was the only system really used that
> >> part.
> >>
> >>
> >> I don't want to see it as naemon-livestatus is older, but just a little
> >> bit different.
> >>
> >> The naemon fork of livestatus has taken a path through op5 before ending
> >> up as the naemon-fork.  During that time, some issues has been resolved:
> >> - Add sorting (and pagination) support, and some bugfixes too. (Sort:
> >> column_name asc/desc, Offset: 80, Limit: 20)
> >> - Regexp handles case sensitivity for unicode characters correctly (it's
> >> really new, so I'm not sure if it's in master yet. Just know that Anton
> >> Löfgren/catharsis has it in a branch right now)
> >>
> >> In the naemon-fork, there are also a couple of bug fixes:
> >> - Possible segfault due to races between threads when submitting
> commands.
> >> (Command processing in upstream is done in worker thread, but
> naemon/nagios
> >> isn't thread safe itself, since it doesn't use threads)
> >>
> >> In short: naemon-livestatus and mk-livestatus has diverged, and before
> >> it's practical to upstream changes, it probably will be too.
> >>
> >>
> >> So are there any specific features you need or bugs to resolve in
> >> naemon-livestatus that are available in mk-livestatus? Because then,
> it's
> >> probably quite easy to just port those specific ones.
> >>
> >> Best regards,
> >> Max Sikström
> >>
> >> On 28 Mar 2014, at 19:58, Eron Nicholson <eron at basecamp.com> wrote:
> >>
> >> > Hello,
> >> >  I am attempting to use Naemon with Check_MK.  Check_MK released a
> >> > new version of livestatus today (1.2.5i1) which supports Nagios 4.
> >> > However, I am getting errors when trying to use it with Naemon :
> >> >
> >> > [1396026973] Error: Could not load module
> >> > '/usr/lib/check_mk/livestatus.o' -> /usr/lib/check_mk/livestatus.o:
> >> > undefined symbol: get_next_log_rotation_time
> >> > [1396026973] Error: Failed to load module
> >> > '/usr/lib/check_mk/livestatus.o'.
> >> > [1396026973] Error: Module loading failed. Aborting.
> >> >
> >> > We have been having issues with the forked naemon version of
> >> > livestatus crashing.  We push in a lot of passive services, and it
> >> > seems that this is causing livestatus to crash.  The forked version is
> >> > quite old.  I was wondering if there was a plan to update naemon's
> >> > livestatus to a more recent version or if there was a plan to allow
> >> > naemon to integrate with the latest version of livestatus.
> >> >
> >> > Thanks,
> >> >
> >> > Eron Nicholson
> >> > Systems Administrator | Basecamp
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/naemon-dev/attachments/20140331/be547c19/attachment.html>


More information about the Naemon-dev mailing list