Caching

Definition of cache

Any hardware of software component that stores data so that future requests can be served faster. The data stored could be the result of a computation or data stored elsewhere in a slower to access storage.

Cache hit and miss

If the results of a call is returned from the cache, its called a cache hit. If the result doesn't exist in the cache and the application needs to reach out to the authority, its called a cache miss.

Authoritative data v/s cached data

An authoritative data store is sort of the System of Record (SoR) that always contains the latest state for that resource or entity. On the other hand, a cache data store could be a subset, the full set or joins of multiple data sources that could be used to serve the same (or augmented) data faster.

As such, a cached data store will never be an authoritative data store and should not be treated as such.

Properties of a cache

  1. Short lived "copy" of data, that is otherwise stored in one or more authoritative data store(s).
  2. Typical faster to execute compared to the authoritative store(s).
  3. Typically can support higher throughput.

Locality of reference

Comes from the principle of locality, which is the tendency of a processor to access the same set of memory locations repetively over a short period of time. In computer applications, this could be true as well where a subset of the data is accessed much more than others and thus, makes sense to be cached. Such access patterns could exhibit:

  1. Temporal locality - as we discussed, the same subset of data is accessed much more than others over a short period of time and thus, makes sense to be cached.
  2. Spatial locality - data is requested that is stored physically close to the data that has already been requested. In other words, data is stored closer to the data (or instruction) that has been executed has a higher chance of being executed.

Apache Hadoop is great example of how this can be realized in a distributed system. Taking its example:

  1. Local data locality - where data and computation logic exist on the same node of the cluster providing lowest latency and best performance.
  2. Intra rack data locality - where data and computation logic does not exist on the same node but on different nodes in the same rack in the cluster. A rack is a physical arrangement of nodes in a cluster where nodes where nodes in a rack are connected to a common network switch.
  3. Inter rack data locality - where data and computation logic exist across different racks in a cluster. This has the lowest performance and highest latency.

Another non-product specific example with distributed systems could be:

  1. Temporal locality could be used for a menu whose entries come from a database and can be cached. Note menu items are relatively non-volatile and frequently accessed, thus, a cache would be beneficial.
  2. Spatial locality could be used in a paged API response. When page 1 with records 1-10 is fetched, its very likely that page 2 with records 11-20 would be fetched next. Thus, page 2 could be pre-fetched and cached.

Where to put in a cache

This is a huge point of discussion in any system design interview. Too many caches across different layers of the stack could lead to data inconsistencies if one (or more) of those caches have stale data. Too few (or no cache) could mean higher latencies and lower throughput. So where to put in the cache?

Lets take a "typical web" of micro-services:

Assuming we want better latencies from Microservice 1, there are a few opportunities to cache here:

  1. At microservice 1 1.1 Cache only its own data 1.2 Cache only its dependency's data 1.3 Cache both its own and dependency's data
  2. At microservice 1.1 2.1 Cache its own data

Now lets say we want the best latencies, so we choose 1.3 and 2.1,

  1. If the cache for Microservice 1.1 is not invalidated properly, it could result in Microservice 1.1 containing a different state compared to the one returned to the user.
  2. If either of the caches for data stores owned by that microservice is not properly invalidated, it could result in the service returning a state different than what its data store contains.
  3. If Microservice 1 joins data between its store and that of Microservice 1.1, it could result in invalid aggregates returned to the user.

Now imagine this happening across tens of microservices owned by different teams.

The purpose of this exercise is to illustrate caches are not a silver bullet solution to speed things up and often "where to put a cache" is not easily answered. The best path forward is to profile and benchmark and identify the "slow" points in the call stack and evaluate if it makes sense to add a cache there. Other considerations are:

  1. Access patterns - if the call that we are trying to cache pulls from live dependencies, it might not make a lot of sense to do so, because there will be a lot of cache misses. Or if its hard to recognize the locality of reference, a cache would be of no use at all.
  2. Cache TTL - if the desired TTL (time to live) of our cache is forever, that's not a cache, its a database. So we need to make sure we are using the cache as a short-lived, temporal data store.

Common caching patterns used in microservices

Application level caches

Caches at the application level, the caches in the previous section are example of those. Application level caches could be of two types:

  1. On host caches - Caches that are local to each node. Usually cheaper to implement and maintain. Performs poorly if there are multiple nodes and load is distributed randomly across loads. Example is Guava Cache.
  2. Distributed caches - Hosted caches that could be scaled independently and would return the cached value regardless of the node its called from. Examples are Redis and Memcached.

Content Delivery Networks (CDN)

Typically used to serve static content (CSS, JS, etc) by locally caching the content on the edge, preferably in a geography closest to the user. Example: AWS Cloudfront.

A CDN usually follows one of these two delivery methods:

  1. Push - whenever a static content is updated or a new one created, its pushed to all edges.
  2. Pull - when a request for a static content comes in, if its not available, it will pull the content from the authority.

Its a tradeoff to decide which one to use when. For instance, if there are millions of static content that almost never changes, push might work better because we wouldnt want to expire and re-populated all these content periodically. However, if the site is seldom visited from a geography, it wouldnt make sense to push the content to that geography. A pull model might work better in that case.

Primed cache

Best for read-only data that can be predicted to be requested for, the cache is populated beforehand.

On-demand cache

Best for data that cannot be predicted to be requested for, the cache is populated when there is a demand for the block of data.

Cache Invalidation v/s Eviction

There are two concepts for removing cache entries:

  1. Invalidation - when a cache entry still might be used but is no longer valid (e.g. there's an updated state), the process of removing that entry is called invalidation.
  2. Eviction - when a cache entry is no longer relevant, it is removed to free space for other entries.

Cache invalidation strategies

Write-through cache

Write-around cache

Write-back cache

Which strategy to use when

It depends on the application needs. The following matrix gives an idea on how to map to a strategy based on the application's needs.

Strategy Data loss probability Write Latency Read Latency
Write-through Low High Low
Write-around Low Low High
Write-back High Low Low

Eviction policies

Policy Description
First In First Out (FIFO) Evicts first accessed block
List In First Out (LIFO) Evicts the last accessed block
Least Recently Used (LRU) Evicts the block that was used the least recent, aka, among the block the one that was used first
Time Aware Least Recently Used (TLRU) Evicts blocks with small Time To Use (TTU) and less recently used
Most Recently Used (MRU) Opposite of LRU
Least Frequently Used (LFU) Evicts the block used the least frequently
Least Frequently Recently Used (LFRU) Has a privileged partition with most used blocks. Evicts unprivileged blocks and pushes popular blocks to privileged partition
Random Replacement (RR) Randomly evicts a block

Return to Index