Nagios caught SIGSEGV but doesn't seem to shut down all the way

Chris Beattie cbeattie at geninfo.com
Fri Dec 26 18:16:21 CET 2008


Hello all,

 

I'm running Nagios 3.0.6 compiled from unmodified source on CentOS 5.2
x86_64.  I noticed that notifications stopped early this morning, and
the logs said Nagios caught SIGSEGV, and it was shutting down.  Nagios
doesn't appear to go all the way down, though.  All the CGIs still work,
but no checks are being performed.  There is a lock file, and nagios.cmd
still exists.  The first one I saw happened after Nagios had been
running fine for a while, but the same thing happens if I issue a
killall -SIGSEGV naigios command, defunct processes and all.  This is
what I got after I did the killall, then a service nagios start, then
another killall.

 

# ps -fC nagios

UID        PID  PPID  C STIME TTY          TIME CMD

nagios    1469     1  0 10:47 ?        00:00:00
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

nagios    1470  1469  0 10:47 ?        00:00:00 [nagios] <defunct>

nagios    1918     1  6 10:51 ?        00:02:55
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

nagios   16350  1918  0 11:25 ?        00:00:00
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

nagios   16351 16350  0 11:25 ?        00:00:00 [nagios] <defunct>

 

Thanks to Paul Weaver's suggestion earlier this month, I've got a
failover Nagios server running.  Once a minute, it checks the primary
server.  I didn't set the conditions for failing over correctly, so it
didn't take over in this case, though it sometimes does for a moment
when I restart the primary Nagios after I've updated its object
configuration files.  The output of its check_nagios command looks like
this after the primary Nagios gets a SIGSEGV:

 

# ./check_by_ssh -H primaryhostname
--command='/usr/local/nagios/libexec/check_nagios
--filename=/usr/local/nagios/var/status.dat --expires=60
--command=nagios'

NAGIOS OK: 3 processes, status log updated 228 seconds ago

 

When I fixed the expiration, it gave me a warning state and I could've
failed over on that.  However, the way I did things, the failover server
thought everything was all right.  So, that's my problem to fix, but
shouldn't Nagios shut all the way down as well?

 

Thanks!

-Chris


Nothing in this message is intended to make or accept and offer or to form a contract, except that an attachment that is an image of a contract bearing the signature of an officer of our company may be or become a contract. This message (including any attachments) is intended only for the use of the individual or entity to whom it is addressed. It may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, we hereby notify you that any use, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this message in error, please notify us immediately by telephone and delete this message immediately.

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20081226/d36d8a68/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list