Problems with FreeBSD and Nagios

Jonathan Call jcall at verio.net
Thu Dec 14 16:07:16 CET 2006


nagios# gdb --pid=$74056
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd".
"/var/spool/nagios/rw/" is not a core dump: File format not recognized
(gdb) bt
No stack.
(gdb)

Given your ideas and some google work I seem to have found my problem: 

http://lists.freebsd.org/pipermail/freebsd-hackers/2005-August/013247.ht
ml

Not a pretty discussion. :(

I'll try using a non SMP kernel to see it might help. If it doesn't this
pretty much renders Nagios useless on FreeBSD. (Which makes me wonder
why they even bother maintaining it in ports?)


> -----Original Message-----
> From: Andreas Ericsson [mailto:ae at op5.se]
> Sent: Thursday, December 14, 2006 2:26 AM
> To: Jonathan Call
> Cc: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Problems with FreeBSD and Nagios
> 
> Jonathan Call wrote:
> > I scanned the mailing list trying to find a solution for this. I
found a
> > brief discussion where someone had the same problem but there was
> > nothing really discussed what was potentially wrong.
> >
> > My system:
> > Dual 2.8GHz P4 processors
> > 4GB of RAM
> > FreeBSD 6.1-RELEASE-p10
> >
> > Running processes:
> > Nagios 2.6 (installed from ports without embedded perl or nanosleep)
> > One mysqld process for the nagiosweb utility
> > A few NSCA daemon processes for passive checking
> > A backup tool daemon
> > Apache+modssl (latest from ports)
> > Basic FreeBSD services (sshd, sendmail, etc.)
> >
> > Problem:
> > Random service and host check control processes will lock up and
'spin'
> > on the CPU. This is really bad when a host check does it because it
> > brings all checks to a halt. It doesn't seem to even notice that all
> > checks have gone stale.
> >
> > It will look like this in top:
> >
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
> > COMMAND
> > 94068 nagios      1 116    0  7500K  6748K CPU2   0 727:37 30.15%
nagios
> > 94082 nagios      1 116    0  7500K  6748K CPU2   0 734:28 32.55%
nagios
> > 94104 nagios      1 116    0  7500K  6748K CPU2   0 845:21 37.42%
nagios
> > 75338 nagios      5  20    0  7500K  6776K kserel 0  91:33  0.00%
nagios
> >
> > In this example the main nagios pid is 75338. The hung service
and/or
> > host processes are the other ones.
> >
> > The service checks are almost entirely custom scripts, but the host
> > check is a standard check_ping that comes with the nagios program.
> >
> > Any ideas on how to figure out which service or host check is hung?
Or
> > how to deal with this problem at all?
> >
> 
> Host and service checks going into infinite loops wouldn't show up as
> Nagios processes in CPU spinlock, as the nagios check execution
children
> just sit around and wait for the child to finish (or 60 seconds to
pass
> in default config, before it kills it off).
> 
> You've found a bug in Nagios which most likely was either introduced
in
> the port of it, or is a result of library differences between FreeBSD
> and Linux.
> 
> I wouldn't be all too surprised if it turns out that the FreeBSD
pthread
> implementation disallows something that the Linux version allows. Note
> that this doesn't necessarily have to be a bug; Nagios doesn't use the
> pthread ABI in a way that is explicitly stated as safe, but the
pthread
> implementation on Linux and most other unices are forgiving enough to
> make it work anyway.
> 
> It's also possible that this bug only triggers on dual-CPU systems
with
> a particular library installed, as some kinds of timing and
> race-conditions just doesn't happen on single-CPU systems.
> 
> What happens if you do
> 
> $ gdb --pid=$(pidof spinning-nagios-process)
> (gdb) bt
> 
> ?
> 
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list