Blocked by Unifi – Troubleshooting a Home Networking Issue

My current project is using Azure DevOps for source control. I’ve been noticing that sometimes when I’m on my home network, git operations would hang and then fail. The issue was very intermittent — if I waited long enough, it would eventually start working again. But it was incredibly frustrating when I was trying to get work done.

After dealing with this for a while, I finally decided to investigate what was going on. I’ll walk through the networking tools I use to diagnose issues like this, and the systematic process I followed to track down the root cause.

Git

I’m using ssh for connecting to git, so the first thing I wanted to see was more logging about the ssh connection to Azure DevOps. It’s possible to alter the ssh command that git uses by setting an environment variable.


GIT_SSH_COMMAND="ssh -vvv" git pull

The first time I ran this, everything worked. I tried it again – it worked again. But the third time it would hang, and then eventually timeout:


ssh_dispatch_run_fatal: Connection to 20.37.158.9 port 22: Operation timed out
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Once this happened, it would stop even being able to connect, hanging at:


debug1: Connecting to ssh.dev.azure.com port 22.

If I waited a while, the git pull would work – but only once or twice But the third time it would fail somewhere in the middle of the operation, and then it wouldn’t even be able to connect after that.

ssh

Since I’m using ssh (instead of https) for connecting with git, the next step was to try connecting directly using ssh.


  ssh -vvvT [email protected]

I saw the same behavior as with git – sometimes it would fail, but if I waited and tried again a few times, it would eventually work, but then start hanging after a few successful attempts.

telnet/netcat

The next step was to try connecting directly to the port using telnet or netcat. I can never remember the right arguments for netcat, so I usually just use telnet.


  telnet ssh.dev.azure.com 22

I wasn’t able to glean much from this. After a failed ssh attempt, I’d try this and it wouldn’t be able to connect. If I tried again a few times, it would often work – at which point I’d try ssh again, and I’d see the same behavior as described above.

Just to try something different, I looked up the arguments for netcat and tried it out:

The “-w 3” sets a 3-second timeout for the connection attempt – helpful when it’s hanging on most attempts.

dig

You know what they say about networking issues: “It’s always DNS.” And since I run a local DNS server on my home network, it very often IS DNS for me at home.


  dig ssh.dev.azure.com

The output of dig has an ANSWER SECTION, that includes the resolved IP address for the specified host:


;; ANSWER SECTION:
ssh.dev.azure.com.	3553	IN	CNAME	tfsprod.trafficmanager.net.
tfsprod.trafficmanager.net. 253	IN	CNAME	tfsprodcus6.centralus.cloudapp.azure.com.
tfsprodcus6.centralus.cloudapp.azure.com. 10 IN	A 20.37.158.23

If I ran it again I’d see something like:


;; ANSWER SECTION:
ssh.dev.azure.com.	3551	IN	CNAME	tfsprod.trafficmanager.net.
tfsprod.trafficmanager.net. 251	IN	CNAME	tfsprodcus6.centralus.cloudapp.azure.com.
tfsprodcus6.centralus.cloudapp.azure.com. 8 IN A 20.37.158.23

The last line (the one with the IP address) has a number before the “IN A” – that’s the TTL (time to live) for that record. It looks like it’s set to only 10 seconds for this record, which means I could potentially get a different IP address back if I wait long enough.

Given that I was seeing ssh fail to connect after a few successful attempts – which might be around the 10 second mark, I wondered if the rotating IP addresses might be the cause of the problem. I also was wondering if Azure might have some kind of rate limiting on a firewall or something.

Just to see if anything looked different with a different name server I tried:


  dig @8.8.8.8 ssh.dev.azure.com

But that didn’t reveal anything different or useful.

VPN

I couldn’t recall ever seeing this connection issue at work, so to confirm that this was just a problem on my home network, I connected to the Atomic Object VPN, and ran the following command:

While connected to the VPN the nc command would succeed every single time.


20.37.158.2 22 (ssh) open
20.37.158.2 22 (ssh) open
20.37.158.47 22 (ssh) open
20.37.158.47 22 (ssh) open
20.37.158.47 22 (ssh) open

But if I disconnected from the VPN, it might fail immediately, or after a few successful attempts.

That eliminated any ideas about rate limiting on the Azure side and confirmed that the issue was specific to my home network.

Re-Assess the Situation

At this point I knew:

  • The issue was very likely specific to my home network
  • It wasn’t an SSH key issue (able to connect and authenticate – but only sometimes)
  • DNS resolution was working properly, but the IP addresses rotated frequently
  • When the connection wasn’t working, it didn’t matter what tool I was using to connect
  • If I got lucky and it worked, it would stop working after a few back-to-back attempts

I did some searching and couldn’t find any concrete examples of others having this exact problem. I was out of ideas, and the only things I could think of blaming that hadn’t been ruled out were:

  • My internal router
  • My AT&T gateway
  • My AT&T service

Unifi Gateway

My home network sits behind a Unifi Dream Machine Pro. I knew that I had some security measures enabled, but I thought those had more to do with preventing inbound intrusions so it seemed unlikely they’d play a role in this outbound connection issue.

I was wrong.

When I finally tracked down the screen that has the log of security “Threats” (which seems to change locations with every update), the page was full of entries originating from my laptop with destination IPs that I recognized from all of my “dig” attempts, along with this Signature name:


ET SCAN Potential SSH Scan OUTBOUND

My router thought my git activity was some kind of SSH scan, and after a few attempts it was blocking my laptop from connecting to the current ssh.dev.azure.com IP address! I don’t know how long the block lasts, but thanks to the short DNS TTL I was getting a new IP address every 10 seconds or so, and eventually I would get lucky and hit an IP that wasn’t blocked yet. But then it would start blocking that new IP after a few more attempts.

Unifi allows you to “Suppress Signature for Source IP”, which I selected for my laptop. And voila! The problem was solved!

Conclusion

You can use a variety of simple command line tools to test connectivity, and by methodically ruling out potential culprits it’s possible to track down the root cause of most networking issues.

Conversation
  • cc8cd24e-c7d2-491e-9b79-7a829df3435b says:

    > You know what they say about networking issues: “It’s always DNS. Then it’s the firewall.”
    FTFY.

    Some unsolicited advice: demote that shiny paperweight to a AP & get a OpenWrt box for your gateway (eg: FriendlyElec NanoPi series). There’s a reason UBNT used to build on OWRT for CPEs if not Debian (‘EdgeOS’). If you can handle the CLI you’ll love the freedom that comes with UCI & busybox (or `apk add bash` if you can’t grok `ash`). There’s Ansible playbooks available, too.

    My apologies you’re a Azure sysadmin; get well soon.

  • Join the conversation

    Your email address will not be published. Required fields are marked *