The cache is a fundamental part of modern system architecture. Most software architectures feature caching heavily, especially ones designed to serve web requests. Without caching, services like Netflix, Facebook, and Google would be astronomically expensive and slow. But caching isn’t easy or free. Plus, the price for bad caching decisions isn’t paid steadily over time, but all at once in acute disaster scenarios.
The Importance of Caching
A lot has been written about the successful use of caching and the difficulties in theory but not enough about the practical failure modes. In practice, engineers will put a cache in place without giving it much second thought. But, what are some good arguments for _not_ adding a cache? After all, how does that famous quote go? There are two hard problems in computer science, and one of them is cache invalidation?
Caching is fundamental.
Caching is incredibly important in a fundamental way to computation. In algorithm design, there is always some tradeoff between space and time complexity. Likely, you’re reading this on a machine with a CPU with a die area that is 30-50% devoted to cache. Countless caches live in networking equipment between you and where this website lives.
And, the operating system you’re using is caching memory and disk access that your browser makes. Of course, your browser has pulled this blog post from a CDN and stored it in a local cache. Even your brain is storing everything you read in a sort of cache, and decides whether to keep or discard this information later. You really can’t get away from it; it’s just too useful a tool. In your own designs, your only choice will be how to use a cache, not whether to use one.
But problematic.
But, when caching goes bad, it gets really horribly catastrophically bad. All the systems I mentioned above have had their share of caching-related bugs that have had massive consequences for the engineers and companies behind the products involved (and, of course, a brain caching problem is most tragic of all). I’ll just talk about the three most common ways that I’ve see caching cause problems, but you can be sure there are more. And I’ll focus on the web application angle, but you can be sure engineers in other specialties will have their war stories.
Caching and the Invalidation Problem
Caching is an easy way to reduce request latency, but at the cost of introducing eventual consistency and complexity. If an API call takes a long time, it can be very tempting to put a cache in front of it. As soon as that happens, you introduce the problem of keeping that cache up to date.
A Complex Problem.
Cache invalidation has many techniques, but all of them involve some window of data staleness and inconsistency. And invalidation is famously a very complex problem.
In a web application, it’s easy (and common!) to take this to an extreme and have layers of caching. You can use Redis to cache database queries or API calls on the server, proxy web requests through a CDN, and cache some items locally in your website state (via something like static site generation in NextJS for example). Everything will load faster. However, keeping the data in your web application in sync with what is actually in your database becomes quite difficult. Invalidating the data in one cache layer and pulling data in from the caching layer above it means data updates can take an immense amount of time to reach the user.
Internal Inconsistency
Let’s also consider how caching can make an application internally inconsistent. By this, I mean that different parts of an application can provide inconsistent data. As an example, let’s take a web application that sells widgets. Your application may have a widget search page and a widget detail page. Both of these can come from a cache and, in this example, let’s say the widget search might hit queries cached in Redis and the widget detail page is coming from a CDN. If you change the name of a wiget, it may change on the search page but not on the detail page (or vice versa), creating a very confusing experience for the user.
Another dimension of caching inconsistency occurs when we move caches closer to the user. These are now independent distributed caches, living in or near each user’s computer, being pulled from the same sources, but can’t always be synchronized in their contents. For example, a CDN often involves multiple PoPs (Points of Presence), instances of caches that are located closer to the user. If a user from Los Angeles sends a link to a user in New York for a widget, they are not guaranteed to see the same thing when viewing the same widget at the same time. And if they contact support at your organization for this issue, the person doing support may see yet another version of that widget page!
Cache as Single Point of Failure
One great thing about caching is that it can save a lot of cost in a situation where data doesn’t change very quickly. A few megabytes of cache can save on the cost of spinning up an expensive API server. The traffic served via the API could easily be many magnitudes lower than the traffic being served via the cache, which uses much fewer expensive resources.
The Thundering Herd
But, if a large amount of traffic one day misses the cache and hits the API, we have what’s called the “thundering herd”. The cache no longer holds back a torrent of requests that are now hitting your API. Since the back end is under-provisioned to handle the new load, it can quickly become overwhelmed and stop serving any new requests. Users start seeing errors for some very popular parts of the application, creating an issue with a huge impact. At the same time, if the rest of the cache has data that is expiring or growing stale, the traffic on the API only increases as that data has to be re-fetched too.
In this situation, you’re forced to either scale up to serve all of that traffic (a massive expense), limit the load to the API somehow (throttling, turning off features, shedding load with circuit breakers, etc) while the cache repopulates, or you must find a way to repopulate the cache quicker. Often, scaling up is infeasible, as having 100 or 1,000 times more servers quickly may not be feasible even if an organization can afford it. And, any solution that involves filling the cache takes time, which can involve standing up the backend again and writing to the cache, and can take many hours. Hours of “all hands on deck” mean angry and stressed-out management and frustrated users making support requests.
A Moral Hazard
One other way that caches can make a situation worse is acting as a sort of “moral hazard” when it comes to performance. If some API call is incredibly inefficient, but invoked very infrequently due to caching, there is a tendency to ignore it in favor of other priorities. In fact, you can overlook any performance issues when it’s just 0.1% or 0.01% of your traffic. But when the “thundering herd” finally arrives, all of these performance problems occur in much greater numbers and amplify how quickly your backend crumbles.
Caching and Feature Development
Caching tends to complicate feature development and performance tuning of applications. It mostly complicates things by introducing a lag between cause and effect.
Example
For example, when a client gets an incorrect response from the API, that response could have been generated some time ago. It may have been generated incorrectly because of a bug in an older version of the API, a performance issue at the time it was generated, or because the data in a database somewhere was different.
Now, let’s say you change the shape of your data (like say the keys in a JSON response object). Your front end has to simultaneously handle the new response schema, plus whatever may still be cached.
Consistency Issues
Furthermore, caching introduces another dimension to the values of any variables returned from the cache. When two pieces of data are dependent on each other in a database, the cache can absolutely remove that relationship. Anecdotally, I see this happen often with profile pictures. On many websites, you can update your user profile picture and see it change, but as soon as you navigate to another part of the site, you’ll see your old one. Before, you could update a field in the database, and the database would take care of making sure all future queries reflected that change. Now, data consistency is totally on you.
As mentioned earlier, caching introduces a “moral hazard” that lets development teams avoid fixing issues in the API. It changes the distribution of response times from your API, which is typically a long-tail distribution, and makes it fatter at the lower end and much skinnier at the higher end. This is because it makes all responses that hit the cache faster, which ends up being a high percentage of all requests. Since so few hit the API and create the perfromance issue, it’s more difficult to catch it when it happens and, in a resource-constrained organization, you tend to have a small cadre of users who are perpetually having mysterious performance issues that are hard to diagnose. Many fixes are pushed out, but the rarity of these issues and the difficulty of reproducing them locally for a developer often keeps them around for a long time.
In Practice
So, should you avoid caching? That’s certainly not something I would recommend. Clearly, all kinds of mature systems have incorporated caching successfully. Try to understand what is “good enough” for your users and incorporate caching only when you risk not meeting that requirement. Don’t hyper-focus on performance. The longer you hold off adding something to your architecture, the more you will understand the data access patterns of your application, and the better of a caching strategy you will have. Rethinking and redeploying a caching architecture that is in place and creating issues when at scale is much harder than putting one into place right before you need it. And, being able to focus on the product and feature development more fully that whole time is likely much better for whatever it is you are building.