Caching vs. Multiplexing – An Apples & Oranges Analogy

Applications with high performance requirements generally need ways to shed load from overwhelmed services. There are a couple of common ways to approach this, but which one is best for your current project?

  • Caching is a well understood and traditional way to scale a system, despite cache invalidation being one of the two hard things in computer science.
  • Multiplexing is another option that is often used in signal processing, but less so in software engineering. The advent of Facebook’s DataLoader makes multiplexing a viable alternative to improve performance.

To illustrate the differences, image a local grocer with a couple of employees (connection pool) and curb-side pickup. Customers can drive up, hand over a grocery list (request), and leave with their groceries (data). It’s pretty convenient and works great when customers line up slower than employees can serve them. But curb-side pickup is great, and customers are lining up around the block (requests coming in faster than they can be processed)!

Caching

The store could implement a policy to bring a bin of apples to the curb if a customer asks for them (caching) and no bin has been set out yet (cache miss); subsequent customers wanting an apple can simply grab from the bin (cache hit). Of course, the produce will go bad, so the store also needs a policy for when to refresh the bin (cache invalidation).


function getApple(cultivar) {
  const cachedApple = cache.getApple(cultivar);
  const apple = cachedApple ? cachedApple : database.getApple(cultivar);
  cache.setApple(cultivar, apple);
  return apple;
}

getApple("honeycrisp");

Although the store needs to make sure they have a good refresh policy on a per-fruit basis, the general practice can easily extend to include oranges.

Multiplexing

The store could also train employees to serve multiple customers by combining grocery lists (multiplexing), picking up all the items (batch loading function), and then correctly giving customers their produce (de-multiplexing).


const appleLoader = new DataLoader(cultivars => {
  const apples = database.getApples(cultivars);
  const applesByCultivar = apples.groupBy("cultivar");
  return cultivars.map(cultivar => applesByCultivar[cultivar]);
});

appleLoader.load("apple");

This makes each employee more efficient, but it requires sophistication. Getting various oranges may be very different than doing the same for apples.

Note that Facebook’s DataLoader handles the multiplexing and de-multiplexing, but the batch loading function is custom and should be thoroughly tested; proper multiplexed database queries can be quite tricky to write.

Which Should You Choose?

As is generally the case, deciding whether you should cache or multiplex depends on your situation. Nothing prevents you from using both, but there are trade-offs to consider.

Caching has an advantage when there is a limited variety of data being requested; a high rate of cache hits improves performance dramatically. Caching can also be implemented generically and adapted to a variety of requests with little effort.

When the data is varied enough that cache misses are the norm, multiplexing can be more effective since each connection is more efficient.

Given the variety of requests Facebook receives, multiplexing is an understandably powerful performance multiplier for their servers. If your data characteristics are more limited, as is generally the case, it may make more sense to start with caching.