Make sockets non-blocking

Stephen Gran steve at lobefin.net
Thu May 6 00:32:17 CEST 2010


On Wed, May 05, 2010 at 10:54:48PM +0200, Andreas Ericsson said:
> On 05/05/2010 03:13 PM, Stephen Gran wrote:
> > On Sat, May 01, 2010 at 05:03:59PM +0100, Stephen Gran said:
> >> Hi there,
> >>
> >> We use NDO for network communication with a custom bit of perl to pass
> >> status updates around.  Recently, we've seen that a network flap can
> >> make ndo hang the entire nagios process, which is possibly imperfect :)
> >>
> >> I think I've tracked it down to the write() call in io.c when sending
> >> the actual update to the remote server.  The attached patch is a
> >> relatively naive attempt at making this write nonblocking for network
> >> sockets.
> >>
> >> This is a patch against the CVS - if you prefer a git-am style patch,
> >> that's fine.  I tried to clone the git off of sourceforge this morning,
> >> and got an empty repo.  If there's a better place to clone from, let me
> >> know and I'll fix it up for that.
> > 
> > So, it turned out my initial attempt to keep the patch small had some
> > limitations.  Working patch attached.
> > 
> > To recap, the main problem is that I/O operations are blocking.  This is
> > less important to local file or unix sockets, but can block the main
> > nagios process when the I/O operations are tcp based.
> > 
> 
> It will block on unix sockets too, in case the reader goes to lunch so
> the socket buffer fills up.

Good point.  Well, it will be easy enough to add the logic there as
well, if someone wants to.

> > My first attempt merely marked the socket as non-blocking, and added
> > the optional return code to the list handled in the error path.  What I
> > found during testing was that this had a few problems.
> > 
> > First, the error path adds the return of write() to tbytes.  If write
> > returns -1, tbytes was being decremented, resulting in an infinite loop
> > because the loop termination condition became unsatisfiable.  Even when
> > correcting the return to 0 before addition, there was still no loop
> > termination condition when write could not succeed.  I've hackishly
> > corrected this with a hardcoded maximum number of loops.
> > 
> > To make things a little nicer, we don't even want to enter the write()
> > loop if we know we can't write().  We do this with a zero second select()
> > to check if we can write before entering the loop.  This is admittedly
> > racy, but I'd frankly rather return early than block the nagios main
> > process.
> 
> Humm. I've solved this exact problem in Merlin with a 100 millisecond
> timeout and, failing writability on the socket, just referring to a
> binary backlog which stashes events for me until I need them.
> 
> The binlog api is ridiculously simple and very easy to work with, and
> it's separated to its own source + header file. You might want to grab
> that instead of hacking around with blocking calls and possibly partial
> writes inside the module (which the reader then has to deal with).

Yes, scrapping the entire thing and redoing it did occur to me, but I
thought it might not be the best "hello world" patch on the project
mailing list :)

I wasn't too worried about dealing with message queueing, as NDO already
does that part fairly well (as I'm sure you're aware).  I think that
short term, the simplest logic may be to scrap the while loop altogether
and just mark the message as failed if you get a partial write.  That
way the recipient can ditch it without worrying about reassembling
partial messages, and NDO can deal with it with it's usual retry logic.

That too felt like too big a change for my first patch, but that is the
way I was leaning while looking at it.

> > Back to socket creation.  We could mark the socket as non-blocking
> > after connect() returns, so that we know that we have a valid fd before
> > carrying on.  The problem with this is that the default connect() timeout
> > on the Redhat 5 machine I tested this on is 3 minutes.  That is, in my
> > opinion, again too long a time to block the main nagios process for.
> 
> Definitely. You want to set it to non-blocking, fire off the connect()
> and then check if it's writable to see if the connection succeeded. The
> polling can be done later in a scheduled event of its own, since the
> call will return immediately on a non-blocking socket and therefore
> most likely won't have time to even reach the remote end before you
> poll it otherwise.
> 
> > What I've done instead is to mark the socket as non-blocking before
> > calling connect().  In the connect() routine, if connect() sets errno
> > to EINPROGRESS, we select() on the socket for 15 seconds to see if it
> > succeeds.  If it does not, we enter the normal error path for connect()
> > failures.
> 
> 15 seconds sounds like a lot imo.

It arguably is.  Since this patch makes it a compiled in timeout, and
not a nice configurable one, I wanted one that even people connecting
their monitoring framework over pieces of wet string could live with.
It is definitely less than optimal, but it is still an improvement over
the current arrangement.  I wanted to make a new configuration option
for this, but again, felt that was possibly a bit intrusive.

> > Arguably my choices of 10 loops for termination in write() and 15
> > seconds for connect() are not right for everyone.  They could be moved
> > to configuration options, or they could be taken from existing options,
> > or something.  At this point, it Works For Me(TM), which is good enough
> > at this point.  If people have any specific objections, I would of course
> > be happy to tidy it up for them.
> 
> The connection stuff is totally bearable, but you could easily reduce it
> to 2 seconds and still have it Work For You without the massive amounts
> of latency it would mean for people who have forgotten to start the
> remote end.

Agreed, but again, see above.  That I felt this way while writing it
doesn't mean you have to take it as is, of course.  I'd be perfectly
happy if you wanted the bulk of the patch but thought an 11ms timeout
was better (although I'd be slightly worried about your sanity :) ).

> The write stuff seems iffier. Although it'll probably work just fine in
> practice I get the feeling that bugs in the area will be ridiculously
> hard to track down.

I think ditching the while loop in favor of a straightforward single
write attempt would make it much easier both to maintain and debug,
agreed.

Cheers,
-- 
 --------------------------------------------------------------------------
|  Stephen Gran                  | Q: Why don't lawyers go to the beach?   |
|  steve at lobefin.net             | A: The cats keep trying to bury them.   |
|  http://www.lobefin.net/~steve |                                         |
 --------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100505/99753ef4/attachment.sig>
-------------- next part --------------
------------------------------------------------------------------------------
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list