Nagios 3.0 hanging (10/22 CVS)

Andreas Ericsson ae at op5.se
Wed Oct 24 02:18:58 CEST 2007


Shad L. Lords wrote:
>>> I tried the latest CVS yesterday and started a strace from the very
>>> beginning.  It took less then 5 hours for it to stop processing checks.
>>> I've uploaded the compressed strace for anyone that is interested.  It is
>>> about 5Mb in size.  You might be able to get more information out of it 
>>> if
>>> you can see what leads up to the issue.
>>>
>> The first ENOMEM appears 13:32:39. The first fd leak seems to appear 
>> 13:32:41.
>> It seems my first conclusion was in error. The ENOMEM's aren't the result 
>> of
>> fd leaks; It's the other way around. Or rather, they're separate bugs, but
>> Nagios does something wrong in the ENOMEM path of fork().
>>
>> Valgrind should be able to give a few hints. If you've got time to run 
>> Nagios
>> under it on your system, it would most likely be very valuable.
> 
> Not sure how to do this but I've got the time and  willingness to learn. 
> Just point me at some documentation and I'll plug away at it. I've done a 
> little googling and have run the program with this for tonight:
> 
> valgrind --leak-check=yes --time-stamp=yes --log-file=/tmp/trace/nagios-valgrind 
> nagios /etc/nagios/nagios.cfg
> 

That looks about right.

> I'll make the results available tomorrow.  If you would like other options 
> please let me know which ones.
> 

Well, if this doesn't turn anything up, we might have to hack up half a garbage-
collector.

Out of curiousity, did you compile Nagios with embedded perl support? If so, try
re-compiling without it once the valgrind test has run its course and see if it
still crashes. Embedded perl is known to leak, but I haven't seen it leak enough
to bring Nagios to its knees within the space of a few hours. It usually takes
at least a week, and bi-nightly restarts is normally enough to take care of that
particular problem.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list