Article summary
I recently worked on a feature that led me to learn about an interesting piece of the modern internet backbone — the Public Suffix List (PSL). Knowing more about the PSL isn’t going to make you a 10x developer. In fact, you’ll likely never use it in your day-to-day development. However, sometimes it’s fun to get up close and personal with the surprisingly simple and seemingly hacked-together pieces forming the technologies we rely on every day.
Consider the basics of cookies. If you’re developing “my-site.com,” you know that you can set cookies for the “my-site.com” domain. This will allow data to be shared by all sessions to the domain from that specific device. You may also be aware that related subdomains can share cookies. For example “auth.my-site.com” and “shop.my-site.com” can share data through a cookie on the “my-site.com” domain.
But, have you ever considered setting a cookie for a top-level domain like “.com”? Probably not. It just sounds wrong. The Public Suffix List is what prevents actions like this. Without the PSL, sites could easily track your activity across the entirety of the “.com” domain space!
Public Suffix List: Origins
Mozilla introduced the idea of the Public Suffix List in 2005. Over the next several years it was standardized and incorporated into all major browsers. Standardizing cookies, for cases similar to the one above, was its initial purpose. However, developers have adopted the list for several other uses that we’ll discuss later.
Today, the list is still maintained by volunteers at Mozilla. They have a site where you can read more and submit requests to amend the list. The full list is version controlled in a GitHub repository that typically receives a few new commits per week.
The list itself is not complicated, just very long. Each line of the list (excluding comments) is a new rule describing public domains like com, org, or any other top-level domain (TLD). Rules allow wildcards like *.example.com. This would indicate all third-level domains under example.com are public (i.e. cannot have cookies set for them). However, in that example, foo.bar.example.com would be private and accessible for cookies.
Rules can also have exceptions. For example, !baz.example.com would allow baz.example.com to bypass the previously mentioned wildcard rule. Interestingly, very few exception rules exist at this time and most of them fall under the national Japanese TLD. At the time of writing this post, the PSL contains 9,197 unique rules!
Incidentally, I stumbled across the PSL when I was trying to use cookies on an AWS ElaticBeanstalk application. It had a URL like myapp.us-east-2.elasticbeanstalk.com. Sure enough, the list contains an entry for us-east-2.elaticbeanstalk.com as well as the many other regions for ElasticBeanstalk. A handful of other AWS services have entries in the list too. I thought it was interesting that github.io makes an appearance, and I was surprised by how few Google-related entries there are.
Public Suffix List: Other Uses
Cookies aren’t the only interesting use of the Public Suffix List. Email clients use the list to protect against email spoofing attacks. Browsers use the list to determine the interesting parts of a site’s URL.
Browsers also use the PSL to determine when two sites are likely owned and operated by the same organization. For example, most internet-savvy people would probably assume that the same people run “auth.example.com” and “shop.example.com.” At the same time, they’d also know that “my-site.github.io” and “other-site.github.io” probably don’t have the same owner. No algorithm could determine the difference without first consulting some authority like the PSL.
The internet is an amazing place. I find it fascinating that a piece of modern browser security is a simple text file stored in a public repository operated by volunteers.