Nagios 3.0 hanging (10/19 CVS)
Andreas Ericsson
ae at op5.se
Mon Oct 22 17:00:49 CEST 2007
Shad L. Lords wrote:
> I've had a few instances where nagios will be running but will fail to run
> checks or process anything. I noticed it this morning and did a quick
> strace of the process to see what it was trying to do (see below). I hope
> this will be of use to someone.
>
It is indeed. Thanks a lot.
> open("/var/spool/nagios", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1
> EMFILE (Too many open files)
> open("/var/log/nagios/nagios.log", O_RDWR|O_CREAT|O_APPEND|O_LARGEFILE,
> 0666) = -1 EMFILE (Too many open files)
Here is the primary symptom of the problem, methinks. EMFILE is a pretty
unusual error. There's probably some (or a lot) of codepaths in Nagios
where the check result files aren't closed properly, leading to all
sorts of weird errors ...
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0xb7fe3708) = -1 ENOMEM (Cannot allocate memory)
... and eventually it runs into the good ole ENOMEM. I'm guessing this
happens because the scheduling queue keeps filling up more or less
indefinitely, and the child processes keep stacking up as well.
Personally, I think the only sane thing to do when you get ENOMEM is, in
the absence of garbage collectors to run, to just die as gracefully as
possible with a loud, loud error message in the logs, and possibly
leaving a core dump. kill(0, SIGSEGV) can accomplish that last thing.
I won't have time to dig into this until tomorrow, but with Ethan
blazing through the codebase he'd probably have it fixed before me
anyway. :)
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
More information about the Developers
mailing list