Possible bug in NSCA

Chris Wilson chris at aidworld.org
Fri Sep 23 01:36:26 CEST 2005


Hi all,

I think I may have found a bug in NSCA. I don't know where to report it,
but the copyright appears to be Ethan Galstad, so I hope someone here
can help me.

I just dicovered that NSCA on our main nagios server has been spinning
and eating CPU for the last week. Strace shows this, over and over:

> rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
> close(4)                                = -1 EBADF (Bad file descriptor)
> accept(4, 0, NULL)                      = -1 EBADF (Bad file descriptor)
> time([1127431587])                      = 1127431587
> rt_sigaction(SIGPIPE, {0xa21aa0, [], SA_RESTORER, 0x990f48}, {SIG_DFL}, 8) = 0
> send(5, "<27>Sep 23 00:26:27 nsca[4425]: Network server accept failure (9: Bad file descriptor)", 86, 0 <unfinished ...>

lsof shows that fd 4 is not open.

Looking back in the logs, I can see when this started:

> Sep 15 23:52:11 dev nsca[4425]: Network server accept failure (10: No child processes)
> Sep 15 23:52:11 dev nsca[4425]: Network server accept failure (9: Bad file descriptor)
> Sep 15 23:52:41 dev last message repeated 1299103 times

I can't see any other suspicious messages in the logs around that time.

I have no idea what caused the first error (no child processes), but the
result seems inappropriate. It appears that nsca handles this error as
follows, in accept_connection():

>         /* wait for a connection request */
>         while(1){
>                 new_sd=accept(sock,0,0);
>                 ...
>         }
>
>         if(new_sd<0){
>                 ...
>                 syslog(LOG_ERR,"Network server accept failure (%d: %s)",errno,strerror(errno));
> 
>                 /* close socket prior to exiting */
>                 close(sock);
>                 return;
>                 }

But nsca does not exit: accept_connection is called in an infinite loop,
and keeps trying to accept() on a socket that's now closed. 

This seems to be bad behaviour, but I'm not sure what the correct
behaviour would be. Any ideas?

Cheers, Chris.
-- 
(aidworld) chris wilson | chief engineer (chris at aidworld.org)



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. 
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list