Hosting your cloud infrastructure on AWS in a more protected environment can strengthen the security of your infrastructure. That’s what an Amazon Virtual Private Cloud (VPC) is for. It creates an isolated slice of the cloud within which you have control to manage your own network.
But it also means you’re now required to manage that network and deal with the occasional interactions and limitations when combined with other AWS services. The following are a few considerations to keep in mind when using an AWS VPC.
1. You’re Now a Network Administrator
When is the last time you needed to work with route tables? Network address translation (NAT) tables? Allocating IP address ranges to subnets? Managing ingress and egress access to the internet?
It can be a bit daunting to step into a VPC setup if you haven’t done much network administration. And even if you’re familiar with it, there is quite a bit of additional configuration to track and manage.
Good tools make it manageable. (You are using CloudFormation, Terraform, or something similar, right?) And the AWS console at least makes it reasonable to walk through the different parts of the configuration.
You’ll want to familiarize yourself with the AWS documentation about VPCs, subnets, security groups, routing tables, etc. It’s pretty helpful.
2. Not All AWS Services Enjoy the Same Level of Support
AWS introduced VPCs in 2009, targeted at creating a private network for a set of EC2 instances. They’ve added support to many other AWS services over time, but not all of them enjoy the same level of support as the original EC2 service.
A few of the restrictions we’ve run into include:
- Lambdas can be connected to your VPC and access other private resources. VPC-connected lambdas require additional cold start time. They use an ENI in the VPC per parallel invocation, and that setup takes extra time — up to 10 seconds.
- You can create a private API using API Gateway, but you have to live with the domain name generated by AWS. This was a non-starter for us on one project because other services outside our control needed to know the API’s hostname and we weren’t confident that the generated domain name wouldn’t change if we tore down and re-created our API.
- AWS Aurora databases require a VPC that spans two availability zones and requires that the subnet group used by the database contains only one subnet per availability zone. Other RDS instances aren’t as restrictive regarding the subnet group requirement, but it’s probably a good practice anyway.
3. You’ll Need Troubleshooting Tools
When you lock down access and have multiple subnets, it gets more complicated to figure out why things go wrong. Here are some tools that helped us get on the right path.
(Ours is in a fixed subnet, but you could easily stand up instances in different subnets to test the source and target of a troublesome connection.)
VPC Flow Logs
Flow logs are the best way to get visibility into existing traffic inside your VPC. They help identify problems with security group access rules or ACLs. Unfortunately, they’re less helpful for routing problems because they don’t show all traffic; things like DNS and DHCP are excluded.
To make sense of your data:
- Look for REJECT records.
- Search for the source and destination IP addresses.
- Check the destination port to make sure you’re finding records for the services you care about.
The AWS New Blog has a guide on how to enable flow logs.
A Linux EC2 Instance in the VPC
Sometimes you just need someone on the inside. We stood up a simple Amazon Linux EC2 instance and opened up SSH to the host. We use it to launch other tools, including:
ping HOSTNAME or ping IP_ADDRESS
- This is simple to use but may result in misleading failures.
- If ping works, the host is reachable.
- If ping fails, the host may not be reachable or may be configured to not respond to pings (fairly common), oo failure doesn’t mean much.
telnet HOSTNAME PORT or telnet IP_ADDRESS PORT
- This can attempt to connect to the same host and port used by the target service, improving your ability to prove that it can or can’t be reached.
- If it exits immediately with a failure, you know it can’t connect.
- If it opens the connection and waits for your input, it can connect.
- hostname and port – If it fails while using a hostname, DNS resolution could still be an issue (see below).
- IP address and port – Using an IP address takes DNS resolution out of the picture. Failure could still mean routing problems or security group configuration problems.
nslookup / dig
- nslookup HOSTNAME shows information about how a DNS lookup resolved for a given hostname.
- dig HOSTNAME shows more information about how a DNS lookup resolved for a given hostname.
- dig +short HOSTNAME requests the IP address for a hostname.
4. Temporarily Relax Security Group Rules
If you want to prove that security group rules are the problem, relax the rules. Allow all ports inbound and outbound. This is especially helpful in classifying a problem as a security group problem or a routing problem.
After you relax the rules, try walking through your problem again, revisiting the troubleshooting tools we reviewed above. If your problem vanishes, then congratulations! You’ve identified a problem with your security group configuration. You can start turning the restrictions back on one-by-one until you have problems again.
An AWS Virtual Private Cloud (VPC) can provide a useful way to group resources together in a more protected environment. But as with most security measures, it comes with its own particular management needs.
I’ve found VPCs helpful when we need to connect to a client’s on-prem resources via Direct Connect or when there is a set of resources that are worth hiding behind the enhanced privacy that the VPC provides. They have been worthwhile in those scenarios and are much easier to work with when you know the limitations and have the right set of tools.