Setting a Socket Connection Timeout in Ruby

Article summary

Ruby’s default socket connection timeout can be too long in many situations (e.g. when feedback needs to be provided to a user that has just entered a host name to connect to). On my Mountain Lion laptop, it takes close to 75 seconds for it to fail to connect to an unused IP address:

irb> require 'benchmark'
irb> puts Benchmark.measure { TCPSocket.new("192.178.1.2", 9090) rescue nil }
  0.000000   0.000000   0.000000 ( 75.451505)

Unfortunately there’s no obvious, built-in way of specifying a connection timeout when using a class like @TCPSocket@. So if you do need to break out of opening a socket connection before the default timeout is reached, you need to code the handling for shorter timeout yourself. There are a few ways you can go about this, but the solution that I believe is the clear winner is to use the Socket class’ @connect_nonblock@ method paired with @IO.select@.

I went down a few wrong paths before I understood enough to realize that this was the right way to go. To help you avoid the same fate, I will lay out one of the wrong ways and provide a code example of how to do it the right way.

Wrong – Timeout::timeout

As suggested by this “StackOverflow answer”:http://stackoverflow.com/questions/12734811/what-is-the-standard-socket-timeoutseconds-in-ruby-socket-programming, you can wrap the connect logic in a @Timeout::timeout@ block. This seems pretty straightforward, but a little bit of research on using @Timeout::timeout@ indicates that this probably isn’t a good idea. For example, if you look closely at one of the 0-vote answers in that same StackOverflow question, “sylvain.joyeux”:http://stackoverflow.com/users/709627/sylvain-joyeux comments on why this should not be done:

bq. This is ruby-implementation specific (I am pretty sure that it might not work on Jruby / rubinius). Moreover, using Timeout.timeout is in general a bad idea. Performance: Timeout.timeout works by spawning a thread (costly). Functionality: Timeout.timeout depends on the blocking call being interruptible. Fragile: it works by raising an exception in the block, which can happen at any place. For instance, it could raise just after the socket got constructed, which would lead to a leak (the socket would be closed only when the GC got started)

And a CoderWall article on why “Ruby timeouts are dangerous”:https://coderwall.com/p/1novga talks about some of the race conditions you can get into when using @Timeout::timeout@.

h2. Right – Non-Blocking

Combining @connect_nonblock@ with @IO.select@ avoids all of the concerns that come along with @Timeout.timeout@.

def connect(host, port, timeout = 5)

  # Convert the passed host into structures the non-blocking calls
  # can deal with
  addr = Socket.getaddrinfo(host, nil)
  sockaddr = Socket.pack_sockaddr_in(port, addr[0][3])

  Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0).tap do |socket|
    socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)

    begin
      # Initiate the socket connection in the background. If it doesn't fail 
      # immediatelyit will raise an IO::WaitWritable (Errno::EINPROGRESS) 
      # indicating the connection is in progress.
      socket.connect_nonblock(sockaddr)

    rescue IO::WaitWritable
      # IO.select will block until the socket is writable or the timeout
      # is exceeded - whichever comes first.
      if IO.select(nil, [socket], nil, timeout)
        begin
          # Verify there is now a good connection
          socket.connect_nonblock(sockaddr)
        rescue Errno::EISCONN
          # Good news everybody, the socket is connected!
        rescue
          # An unexpected exception was raised - the connection is no good.
          socket.close
          raise
        end
      else
        # IO.select returns nil when the socket is not ready before timeout 
        # seconds have elapsed
        socket.close
        raise "Connection timeout"
      end
    end
  end
end

In two of the error cases, you will notice that I closed the socket before raising an exception. This isn’t strictly necessary, as a Socket instance will auto-close by default when it is finalized. But I wanted to make sure that resources are being freed up as soon as possible, and I prefer being more explicit about cleaning up.

You can find other examples of using non-blocking calls with timeouts in the “resilient_socket library”:https://github.com/ClarityServices/resilient_socket/blob/master/lib/resilient_socket/tcp_client.rb and the “Evernote SDK for Ruby”:https://github.com/evernote/evernote-sdk-ruby/blob/master/lib/thrift/transport/socket.rb.

h2. An Alternative

Mike Perham wrote about “Socket timeouts in Ruby”:http://www.mikeperham.com/2009/03/15/socket-timeouts-in-ruby/ a few years back. In his article he shows how to set the low-level @SO_RCVTIMEO@ and @SO_SNDTIMEO@ timeouts on the socket before connecting. I did not have any luck with this technique in JRuby 1.7.0, but it might be an option for others.

Conversation
  • First off, great article! Thanks for sharing.

    However, its worth pointing out that the recommended solution can actually be pretty bad because Ruby’s non-blocking IO methods actually blow Ruby’s global method cache. This approach will be pretty painful to your entire Ruby runtime performance and for larger scale applications it could be deadly.

    In Ruby 2.1.0 this will become a little less painful with the introduction of the class level method cache, but I think avoiding the non-blocking IO methods if you can do so is still going to be generally preferable.

    Timeout is definitely the fast-lane to having a bad time with sockets (especially under JRuby) but IO#select has its drawbacks as well. IO#select will fail once you have more than 1024 file-descriptors open on the system. For folks running things like large Redis clusters, this severely limits the possible size of their connection pool per node.

    Unfortunately, Ruby doesn’t implement IO#poll (we need to fix this!!) which can handle more than that many FDs so there’s not really a good solution for this yet.

    The best solution is really what Mike Perham suggested. Using the C APIs and setting SO_RCVTIMEO and SO_SNDTIMEO on the Socket directly. I’ve had fairly good success with this myself, but JRuby has had a bumpy past with its IO support. The latest 1.7.x versions of JRuby seem to honor these socket options though and this solution is my recommendation as well.

    • Patrick Bacon Patrick Bacon says:

      Thanks for the excellent feedback Brandon.

      I was not aware of the global method cache problem with non-blocking IO. Do you know if that applies to JRuby as well?

      It looks like I’m going to have to give SO_RCVTIMEO and SO_SNDTIMEO another try in JRuby. :-)

      Thanks again!

  • JRuby actually has both per-callsite method caching and the same hierarchical class-level method caching that you’ll see in Ruby 2.1.0. In fact, the algorithm James Golick used for adding this to MRI was derived from Charles Nutter and JRuby’s implementation.

    The effects of the non-blocking IO methods in JRuby are definitely less severe, just as they will be in Ruby 2.1.0, but as I mentioned before its probably still worth avoiding if you can.

    This area of the Ruby standard library needs some more love. We’d have a better, more reliable solution for this (and much more) if Ruby actually implemented poll. A lot of the async/event-driven libraries in Ruby actually implemented poll themselves to work around issues like this.

  • Also, FYI: Despite IO#poll() not existing, IO#select() does use poll under the hood for some scenario’s in Ruby 1.9+

    http://bugs.ruby-lang.org/issues/4531

  • Damian Nowak says:

    The provided solution isn’t complete. What if Socket.getaddrinfo takes 30 seconds to finish?

    • Patrick Bacon Patrick Bacon says:

      Good catch Damian. I never ran into it but it does look like if the Socket.getaddrinfo call gets hung up it would block my connect method, regardless of the specified timeout.

      I’m not familiar with anything, but I wonder if there might be an alternate way of resolving the address that’s non-blocking? I’ll have to look into it sometime.

  • xiewenwei says:

    It is more simple to use Socket.tcp.
    Socket.tcp supports connect_timeout options.
    see https://docs.ruby-lang.org/en/2.1.0/Socket.html#method-c-tcp

  • Comments are closed.