I ran into an issue where I was unable to run more than one system test at a time without the second (and all other tests) failing. These particular tests involved starting an HTTP server to act as the external API that the application would interact with during the course of the tests. The problem was that after the first test completed, the HTTP server would fail to start, getting an “address already in use” error.
netstat after my tests ran revealed that the server port, 8085, was in a
> netstat -an | grep 8085 tcp4 0 0 127.0.0.1.8085 127.0.0.1.58244 TIME_WAIT
I’ve done enough network troubleshooting over the years to be somewhat familiar with
TIME_WAIT. However, I needed to dig into it again to understand why the server port in my test suite was ending up in a
TIME_WAIT state and why that was preventing other tests from running properly. I’ll share my findings in this post.
Address Already in Use
The output of the
netstat command above shows that the TCP connection from localhost 8085 to localhost 58244 is in a
TIME_WAIT state. On my laptop, which is running Mac OS X 10.13, it would stay in this state for 30 seconds before clearing. During that 30 seconds, any attempt to start a server that would listen on port 8085 would fail because the port was considered in use until the
Being in a
TIME_WAIT state, as explained in TCP: About FIN_WAIT_2, TIME_WAIT and CLOSE_WAIT means:
.. that from the local end-point point of view, the connection is closed but we’re still waiting before accepting a new connection in order to prevent delayed duplicate packets from the previous connection from being accepted by the new connection.
Active Close Gets the TIME_WAIT
The HTTP server in my test couldn’t start listening on port 8085 while there was a socket in the
TIME_WAIT state. So how was it ending up in that state?
… it’s the final state that the peer that initiates the “active close” ends up in and this can be either the client or the server.
This meant that my HTTP server must have been initiating the close, since the server port was the one that was ending up in
The solution to my problem was to make sure the client was the one that actively closed the connection first. As long as the client actively closed the connection before the end of the test (when the server would be shut down), the client’s port would get the
> netstat -an | grep 8085 tcp4 0 0 127.0.0.1.60382 127.0.0.1.8085 TIME_WAIT tcp4 0 0 127.0.0.1.60381 127.0.0.1.8085 TIME_WAIT tcp4 0 0 127.0.0.1.60383 127.0.0.1.8085 TIME_WAIT
This meant that the server port was no longer in use. When the next test started, the server could start listening on port 8085 without getting an error.
Figuring out how to do that turned out to be a little tricky in the .NET environment where I was running my tests (using the
System.Net.Http.HttpClient class), and I hope to write a follow-up post about that experience someday soon.