Problems with FreeBSD and Nagios

Jonathan Call jcall at verio.net
Tue Jun 19 17:02:34 CEST 2007



> -----Original Message-----
> From: nagios-users-bounces at lists.sourceforge.net [mailto:nagios-users-
> bounces at lists.sourceforge.net] On Behalf Of Michael W. Lucas
> Sent: Tuesday, June 19, 2007 5:16 AM
> To: Kyle Sexton
> Cc: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Problems with FreeBSD and Nagios
> 
> On Mon, Jun 18, 2007 at 06:42:18PM -0500, Kyle Sexton wrote:
> > On 12/14/06, Andreas Ericsson <ae at op5.se> wrote:
> > > Jonathan Call wrote:
> > > >
> > > > Given your ideas and some google work I seem to have found my
> problem:
> > > >
> > > > http://lists.freebsd.org/pipermail/freebsd-hackers/2005-
> August/013247.ht
> > > > ml
> > > >
> > > > Not a pretty discussion. :(
> > > >
> > >
> > > Nope. Definitely not.
> > >
> > > The problem for Nagios is that threading was added after the fact
so
> > > nagios actually breaks some of the *strong* recommendations on
what to
> > > do and what not to do in a threaded application after a fork().
> > >
> > > The problem for *BSD and their thread implementation of the thread
> > > library is that Nagios actually works everywhere but on *BSD, and
it
> > > *often* works there too, but not always. This
"often-but-not-always"
> is
> > > usually a sign of a broken implementation, although exactly
> > > "often-but-not-always" is a sign of the errors you'll run into
when
> you
> > > do what Nagios does post-fork().
> > >
> > > I don't know of any other program that has the same problem on
*BSD,
> but
> > > it would be interesting to see if there's a common pattern so one
can
> > > pinpoint the exact pattern that causes the lock contention and
races.
> It
> > > would, from a practical point of view, be best to patch it in the
> > > library, as that is a fix that would work for all possible future
> > > problems as well, although it's technically more correct to fix it
in
> > > Nagios.
> > >
> > > Ugly discussion indeed.
> > >
> > >
> > > > I'll try using a non SMP kernel to see it might help. If it
doesn't
> this
> > > > pretty much renders Nagios useless on FreeBSD. (Which makes me
> wonder
> > > > why they even bother maintaining it in ports?)
> > > >
> > >
> > > Out of curiousity, do you use passive checks, active checks or a
mix
> of
> > > both in your setup?
> > Was there ever a solution found to this problem?

No. 
I was forced to implement a distributed model and limit the service
checks to less than 1000 on a server. Even then I still have to run a
cron job that checks for nagios children than are spinning on the CPU as
a result of this fork issue.

I've found that somewhere after 1500+ service checks there will be a
random weekly event that causes almost a hundred nagios checks to hit
this fork issue all at the same time and promptly tank the FreeBSD
server.

> 
> Skimming the (long) discussion thread, my first thought is to try
> libthr instead of libkse.  The discussion seems to be on 5.x, I'd
> definitely try libthr on 6.x.  Check libmap.conf for details.

Are you referring to this type of mapping within /etc/libmap.conf?

[/usr/local/bin/nagios]
     libpthread.so.2         libthr.so.2
     libpthread.so           libthr.so

If so I'd be willing to try it on my FreeBSD 6.2 server.

Jonathan

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list