SEGV in 2.0b2 (FreeBSD 4.10/200 hosts/330 active/300 passive) - repeatedly after 2-7 days running.

Ethan Galstad nagios at nagios.org
Mon Apr 4 06:03:25 CEST 2005


Thanks for the note Stanley.  If you can manage to get a core file or 
track the problem down further, let me know.  I'm releasing 2.0b3 
tonight, so this won't probably be fixed until 2.0b4.


On 2 Apr 2005 at 19:53, Stanley Hopcroft wrote:

> Dear Folks,
> 
> I am writing to report what may be a problem with Nag 2.0b2 (embedded
> Perl, pthread lib, FreeBSD 4.10).
> 
> Nagios runs no more than 10 days before dieing with a SEGV.
> 
> Like a former report of SEGVs ('coredumps in wobbly 
> networks'/Ericsson/24 Mar 2005) there _may_ be a pattern in the logged
> messages before the SEGV.
> 
> Exitting from scheduled downtime appears to be a health hazard.
> 
> In the last case,
> 
> Sat Apr 02 17:05:42 SERVICE DOWNTIME ALERT: 
> foo:bar via the blurfl provider 
> infrastructure;STOPPED; Service has exited from a period of scheduled
> downtime Sat Apr 02 17:06:18 Auto-save of retention data completed
> successfully.
> 
> Sat Apr 02 18:07:33 Nagios 2.0b2 starting... (PID=97771)
> 
> tsitc> grep nagios /var/log/messages
> Apr  2 17:07:52 tsitc /kernel: pid 3400 (nagios), uid 1000: exited on
> signal 11
> 
> And the one before,
> 
> Tue Mar 29 06:20:58 SERVICE ALERT: nada;TEC CPU;WARNING;HARD;1;The
> percentage of CPU in idle state is low. This indicates high CPU
> overload. date: 03/29/2005 06:20:50 AM eventid: 1112041070 557
> modelname: DMXCpu name: total percidlecpu: 0 profilename:
> ITM.OS.Unix_Dev_Monitoring.itm#IPAustralia-region source: TMNT status:
> OPEN
> 
> Tue Mar 29 06:30:44 SERVICE DOWNTIME ALERT: yada;Standard host-centric
> checks;STOPPED; Service has exited from a period of scheduled downtime
> 
> Tue Mar 29 06:30:44 SERVICE DOWNTIME ALERT: wurfl;COMS ad-hoc 
> check;STOPPED; Service has exited from a period of scheduled downtime
> 
> Tue Mar 29 06:30:44 HOST DOWNTIME ALERT: yada;STOPPED; Host has exited
> from a period of scheduled downtime Tue Mar 29 09:11:27 Nagios 2.0b2
> starting... (PID=5473)
> 
> tsitc> grep nagios /var/log/messages
> Mar 29 06:30:44 tsitc /kernel: pid 31467 (nagios), uid 1000: exited on
> signal 11
> 
> Obviously it is easy to check whether scheduling downtime is causal; I
> will give it a go and watch.
> 
> No core file.
> 
> Yours sincerely.
> 
>  -- 
> Stanley Hopcroft
> 
> IP Australia
> Ph: (02) 6283 3189  Fax: (02) 6281 1353
> PO Box 200 Woden  ACT 2606
> http://www.ipaustralia.gov.au
> 



Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click




More information about the Developers mailing list