Practice/Amazon/Design a Distributed Cache System

Design a Distributed Cache System

System DesignMust

Problem Statement

Design a distributed caching system that can serve data with low latency across multiple geographic locations, handling cache misses, replication, and scaling to support billions of records. Think of Redis or Memcached deployed as a managed service that sits in front of databases and APIs to deliver microsecond-to-millisecond latency reads. Users read and write key-value entries with optional TTLs, and the cache offloads backend systems while keeping latency predictable.

Interviewers ask this because it bundles core distributed systems skills: sharding and client routing, consistency versus freshness, replication and failover, multi-region design, hot-key contention, and cache invalidation. You are expected to reason about tradeoffs (cache-aside vs write-through), manage thundering herds, and describe concrete mechanisms that keep the system fast and correct under failures at global scale.

Key Requirements

Functional

Key-value operations -- clients can set, get, and delete entries with optional per-key TTLs
Low-latency reads -- data is served from the nearest available cache node or region with sub-millisecond to single-digit millisecond latency
Eviction policies -- configurable eviction behavior (LRU, LFU) to stay within memory limits
Cache invalidation -- precise invalidation (single key) or broad invalidation (namespace/tag) when underlying data changes

Non-Functional

Scalability -- support millions of operations per second with petabyte-scale data distributed across thousands of nodes
Reliability -- tolerate individual node failures and regional outages without data loss or extended downtime; automatic failover
Latency -- p99 read latency under 1ms for cache hits; p99 write latency under 5ms
Consistency -- eventual consistency for cross-region replication; read-after-write consistency within a region

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Sharding and Data Distribution

How you partition data across cache nodes determines scalability, load balance, and resilience to node changes. Naive approaches cause massive cache churn during scale events.

Hints to consider:

Use consistent hashing with virtual nodes to distribute keys evenly and minimize remapping when nodes join or leave
Assign each key to a primary node and one or more replicas for fault tolerance
Implement client-side or proxy-side routing that maintains a ring topology map, updated via a coordination service
Handle key migration gracefully: during rebalancing, route reads to both old and new owners to avoid cache misses

2. Cache Coherence and Invalidation

Stale data is the most common problem with caching. Interviewers want concrete mechanisms for keeping cache entries fresh.

Hints to consider:

Implement cache-aside (lazy loading) as the default pattern: read from cache, on miss read from DB and populate cache
Support write-through or write-behind for workloads that require immediate consistency
Use TTL jitter (randomize expiration times) to prevent synchronized mass expiration that causes thundering herds
Propagate invalidations across regions using a pub/sub message bus (Kafka) for near-real-time coherence

3. Handling Hot Keys and Thundering Herds

Popular keys and synchronized expirations create load spikes that can cascade to the backend. Interviewers look for proactive mitigation strategies.

Hints to consider:

Implement single-flight (request coalescing) on cache misses: only one request fetches from the backend while others wait
Replicate extremely hot keys across multiple nodes so reads are distributed
Use negative caching (cache misses with short TTLs) to prevent repeated expensive lookups for non-existent keys
Apply circuit breakers to protect the backend when cache miss rates spike

4. Replication and Failover

Single-instance shards are single points of failure. Interviewers expect a replication strategy with automated recovery.

Hints to consider:

Deploy primary-replica pairs for each shard, with synchronous replication within a region and asynchronous cross-region
Use health checks and leader election (via ZooKeeper or etcd) for automatic failover when a primary fails
Handle split-brain scenarios with fencing tokens to prevent stale primaries from accepting writes
Design replica catch-up after failover to minimize data loss and restore full redundancy quickly

Suggested Approach

Step 1: Clarify Requirements

Confirm scope with your interviewer. Ask about expected data volume (total keys, average value size), operation rates (reads/writes per second), geographic distribution requirements, and consistency expectations. Determine if the cache is for a specific use case (session store, database cache, API response cache) or a general-purpose system. Clarify whether multi-tenancy is required and what eviction behavior is expected.

Practice/Amazon/Design a Distributed Cache System

Design a Distributed Cache System

System DesignMust

Problem Statement

Key Requirements

Functional

Key-value operations -- clients can set, get, and delete entries with optional per-key TTLs
Low-latency reads -- data is served from the nearest available cache node or region with sub-millisecond to single-digit millisecond latency
Eviction policies -- configurable eviction behavior (LRU, LFU) to stay within memory limits
Cache invalidation -- precise invalidation (single key) or broad invalidation (namespace/tag) when underlying data changes

Non-Functional

Scalability -- support millions of operations per second with petabyte-scale data distributed across thousands of nodes
Reliability -- tolerate individual node failures and regional outages without data loss or extended downtime; automatic failover
Latency -- p99 read latency under 1ms for cache hits; p99 write latency under 5ms
Consistency -- eventual consistency for cross-region replication; read-after-write consistency within a region

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Sharding and Data Distribution

How you partition data across cache nodes determines scalability, load balance, and resilience to node changes. Naive approaches cause massive cache churn during scale events.

Hints to consider:

Use consistent hashing with virtual nodes to distribute keys evenly and minimize remapping when nodes join or leave
Assign each key to a primary node and one or more replicas for fault tolerance
Implement client-side or proxy-side routing that maintains a ring topology map, updated via a coordination service
Handle key migration gracefully: during rebalancing, route reads to both old and new owners to avoid cache misses

2. Cache Coherence and Invalidation

Stale data is the most common problem with caching. Interviewers want concrete mechanisms for keeping cache entries fresh.

Hints to consider:

Implement cache-aside (lazy loading) as the default pattern: read from cache, on miss read from DB and populate cache
Support write-through or write-behind for workloads that require immediate consistency
Use TTL jitter (randomize expiration times) to prevent synchronized mass expiration that causes thundering herds
Propagate invalidations across regions using a pub/sub message bus (Kafka) for near-real-time coherence

3. Handling Hot Keys and Thundering Herds

Popular keys and synchronized expirations create load spikes that can cascade to the backend. Interviewers look for proactive mitigation strategies.

Hints to consider:

Implement single-flight (request coalescing) on cache misses: only one request fetches from the backend while others wait
Replicate extremely hot keys across multiple nodes so reads are distributed
Use negative caching (cache misses with short TTLs) to prevent repeated expensive lookups for non-existent keys
Apply circuit breakers to protect the backend when cache miss rates spike

4. Replication and Failover

Single-instance shards are single points of failure. Interviewers expect a replication strategy with automated recovery.

Hints to consider:

Deploy primary-replica pairs for each shard, with synchronous replication within a region and asynchronous cross-region
Use health checks and leader election (via ZooKeeper or etcd) for automatic failover when a primary fails
Handle split-brain scenarios with fencing tokens to prevent stale primaries from accepting writes
Design replica catch-up after failover to minimize data loss and restore full redundancy quickly