Unexplained nagios crashes

Duncan Ferguson duncan.ferguson at altinity.com
Mon Sep 17 11:48:12 CEST 2007


On 27 Aug 2007, at 12:19, Andreas Ericsson wrote:
>
> My guess would be that it's an off-by-one somewhere in the code that
> only triggers under some very special circumstances. Since it only
> happens at one customer site, something needs to be special about
> that customer.

Finally we think we have worked out what the problem is, after adding  
more debug output and waiting for the crash to happen again.

We traced the data corruption back to the portion of code following a  
host check from a slave, and that host check was

coreserv5.main.internal;0;|

i.e. no output and no perf data, just a pipe symbol.   These check  
results did come back very frequently, but didn't always cause the  
crash, and seems related to the use of strtok in commands.c when  
stripping the data apart.  We have patched the customers code and are  
keeping a close eye on it (it hasnt crashed again yet), but it seems  
as though Ethan has overhauled the area of code in Nagios 3 already.

If anyone wants the patch then please let us know.

Thanks.

   Duncs

-- 
Duncan Ferguson

http://www.altinity.com
Tel: +44 (0)870 787 9243
Fax: +44 (0)845 280 1725
Skype: duncan_j_ferguson
MSN: duncan.ferguson at altinity.com



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/




More information about the Developers mailing list