Nagios 3.0 hanging (10/19 CVS)

Andreas Ericsson ae at op5.se
Mon Oct 22 17:00:49 CEST 2007


Shad L. Lords wrote:
> I've had a few instances where nagios will be running but will fail to run 
> checks or process anything.  I noticed it this morning and did a quick 
> strace of the process to see what it was trying to do (see below).  I hope 
> this will be of use to someone.
> 

It is indeed. Thanks a lot.

> open("/var/spool/nagios", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 
> EMFILE (Too many open files)
> open("/var/log/nagios/nagios.log", O_RDWR|O_CREAT|O_APPEND|O_LARGEFILE, 
> 0666) = -1 EMFILE (Too many open files)


Here is the primary symptom of the problem, methinks. EMFILE is a pretty 
unusual error. There's probably some (or a lot) of codepaths in Nagios 
where the check result files aren't closed properly, leading to all 
sorts of weird errors ...

> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
> child_tidptr=0xb7fe3708) = -1 ENOMEM (Cannot allocate memory)

... and eventually it runs into the good ole ENOMEM. I'm guessing this 
happens because the scheduling queue keeps filling up more or less 
indefinitely, and the child processes keep stacking up as well.

Personally, I think the only sane thing to do when you get ENOMEM is, in 
the absence of garbage collectors to run, to just die as gracefully as 
possible with a loud, loud error message in the logs, and possibly 
leaving a core dump. kill(0, SIGSEGV) can accomplish that last thing.

I won't have time to dig into this until tomorrow, but with Ethan 
blazing through the codebase he'd probably have it fixed before me 
anyway. :)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list