Possible bug in NSCA

Andreas Ericsson ae at op5.se
Fri Sep 23 09:08:37 CEST 2005


Chris Wilson wrote:
> Hi all,
> 
> I think I may have found a bug in NSCA. I don't know where to report it,
> but the copyright appears to be Ethan Galstad, so I hope someone here
> can help me.
> 
> I just dicovered that NSCA on our main nagios server has been spinning
> and eating CPU for the last week. Strace shows this, over and over:
> 
> 
>>rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
>>close(4)                                = -1 EBADF (Bad file descriptor)
>>accept(4, 0, NULL)                      = -1 EBADF (Bad file descriptor)
>>time([1127431587])                      = 1127431587
>>rt_sigaction(SIGPIPE, {0xa21aa0, [], SA_RESTORER, 0x990f48}, {SIG_DFL}, 8) = 0
>>send(5, "<27>Sep 23 00:26:27 nsca[4425]: Network server accept failure (9: Bad file descriptor)", 86, 0 <unfinished ...>
> 
> 
> lsof shows that fd 4 is not open.
> 
> Looking back in the logs, I can see when this started:
> 
> 
>>Sep 15 23:52:11 dev nsca[4425]: Network server accept failure (10: No child processes)
>>Sep 15 23:52:11 dev nsca[4425]: Network server accept failure (9: Bad file descriptor)
>>Sep 15 23:52:41 dev last message repeated 1299103 times
> 
> 
> I can't see any other suspicious messages in the logs around that time.
> 
> I have no idea what caused the first error (no child processes), but the
> result seems inappropriate. It appears that nsca handles this error as
> follows, in accept_connection():
> 
> 
>>        /* wait for a connection request */
>>        while(1){
>>                new_sd=accept(sock,0,0);
>>                ...
>>        }
>>
>>        if(new_sd<0){
>>                ...
>>                syslog(LOG_ERR,"Network server accept failure (%d: %s)",errno,strerror(errno));
>>
>>                /* close socket prior to exiting */
>>                close(sock);
>>                return;
>>                }
> 
> 
> But nsca does not exit: accept_connection is called in an infinite loop,
> and keeps trying to accept() on a socket that's now closed. 
> 
> This seems to be bad behaviour, but I'm not sure what the correct
> behaviour would be. Any ideas?

After
new_sd = accept(sock, 0, 0)
you should add
if(new_sd == -1 && errno == EBADF) {
	sock = setup_socket();
}

Where setup_socket() is an imaginary function that calls socket(), 
possibly setsockopt(), bind() and listen(), in that order.

A cleaner solution is to have nsca exit if it can't obtain the socket, 
since there's no real reason to think it should be able to obtain one later.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. 
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list