Segfault: I'm still dying here

karl.kornel at mindspeed.com karl.kornel at mindspeed.com
Thu Nov 23 00:04:10 CET 2006


I wish I could say that I had fixed the problem, but I think I've 
experienced the same problem, or at least a similar one.  I've covered all 
of my experiences in the message, posted 17 August 2006, titled "BUG? 
Segfault & coredump with scheduled downtime, downtime scheduled horked".

To cover the times when Nagios does decide to SEGFAULT, I have the 
following item in my Nagios user's crontab, running every 10 minutes:

        /etc/ha.d/resource.d/nagios live status >> /dev/null; if [ $? -eq 
1 ]; then echo 'Nagios live stopped!  Restarting...'; sudo 
/etc/ha.d/resource.d/nagios live start; fi

/etc/ha.d/resource.d/nagios is a modified version of the standard Nagios 
init script.  Unfortunately, it does not account for the times when Nagios 
doesn't SEGFAULT, but does end up corrupting its internal downtime 
information.  I've found that the best way of detecting this corruption is 
to look at the event log:  When a scheduled downtime ends, if other 
services suddenly enter downtime, and they're not supposed to, then you 
have a corruption.  At this point, I've found that asking for a safe 
Nagios restart will result in a restart and an immediate SEGFAULT; after I 
start Nagios again, everything's fine.

Anyway, the crashing can be detected and compensated for, but it would be 
nice if this problem was fixed.  I can't find any place to report bugs, 
other than this list.  Good luck!

-- A. Karl Kornel, Mindspeed Technologies, Inc.
karl.kornel at mindspeed.com -- (949) 579-3503
"Remember the Rules: Separation & Optimization"

nagios-users-bounces at lists.sourceforge.net wrote on 11/20/2006 11:46:41 
AM:

> Hi list
> 
> First, let me apologize for posting this same issue so many times, 
> but I really, really need to get it resolved and I'm really really 
> hoping someone can take a few minutes and help me out.
> 
> Nagios frequently segfaults when processing EXTERNAL commands. The 
> cmd.cgi would hang, but research found that it was because Nagios 
> had given up the other end of the nagios.cmd named pipe. Nagios has 
> crashed and cmd.cgi hangs.
> 
> This has been happening for years, now with every version of Nagios,
> on any of numerous machines (32-bit, 64-bit, mainframe...) on 
> different flavors of Linux including SUSe8, SLES9, and RH.
> 
> Doing a stack trace on the comatose Nagios v2.5 process, I get the 
> following backtrace:
> ============
> Detaching after fork from child process 19947.
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1208391264 (LWP 5431)]
> 0x0806c951 in hashfunc2 (name1=0x2 <Address 0x2 out of bounds>,
>     name2=0x80f <Address 0x80f out of bounds>, hashslots=1024) at 
utils.c:4285
> 4285                    for(i=0;i<strlen(name1);i++)
> (gdb)
> (gdb) bt
> #0  0x0806c951 in hashfunc2 (name1=0x2 <Address 0x2 out of bounds>, 
> name2=0x80f <Address 0x80f out of bounds>, hashslots=1024) at 
utils.c:4285
> #1  0x080768a1 in find_service (host_name=0x2 <Address 0x2 out of 
> bounds>,    svc_desc=dwarf2_read_address: Corrupted DWARF 
> expression.) at ../common/objects.c:5016
> #2  0x0808ef4b in handle_scheduled_downtime 
> (temp_downtime=0x9f21c00) at ../common/downtime.c:311
> #3  0x08063454 in handle_timed_event (event=0x9fa2728) at events.c:1289
> #4  0x08063a9d in event_execution_loop () at events.c:964
> #5  0x0805394d in main (argc=3, argv=0xbff3f274) at nagios.c:710
> (gdb)
> 
> 
> ==========
> Please help!
> 
> 
> 
> - David Schlecht (dschl)
> 
> -----------------------
> The mailing list archive is found here:
> http://www.nagiosexchange.org/nagios-users.34.0.html
> 
> 
> 
-------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share 
your
> opinions on IT & business topics through brief surveys - and earn cash
> 
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list