Practice/Oracle/Design a Distributed Cache System

Design a Distributed Cache System

System DesignMust

Problem Statement

Design a distributed caching system that can serve data with low latency across multiple geographic locations, handling cache misses, replication, and scaling to support billions of records. Think of Redis or Memcached deployed as a fleet across regions.

A distributed cache sits in front of your databases and APIs to serve data with microseconds-to-milliseconds latency. Users read and write simple records by key, with optional time-to-live (TTL), and the cache offloads the backend while keeping latency predictable. Interviewers ask this because it bundles core distributed systems skills: sharding and client routing, consistency versus freshness, replication and failover, multi-region design, hot-key contention, and the realities of cache invalidation.

Key Requirements

Functional

Get, set, delete -- users can set, get, and delete key-value entries with optional per-key TTLs
Low-latency reads -- users can read data with consistently low latency from the nearest available cache node or region
Eviction policies -- users can configure eviction behavior (LRU or LFU) to stay within memory limits
Invalidation -- users can invalidate cached data precisely (single key) or broadly (namespace/tag) when underlying data changes

Non-Functional

Scalability -- support millions of keys per node, hundreds of nodes across regions, and billions of total cached entries
Reliability -- tolerate individual node failures and network partitions with automatic failover and minimal data loss
Latency -- sub-millisecond reads for local cache hits; single-digit millisecond reads for cross-node lookups
Consistency -- eventual consistency across regions is acceptable; support configurable write policies (write-through, write-behind, cache-aside)

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Sharding and Data Distribution

Interviewers want to see how you partition data across cache nodes to balance load and handle node additions and removals gracefully.

Hints to consider:

Use consistent hashing with virtual nodes to distribute keys evenly and minimize redistribution on topology changes
Explain why modulo-based sharding (hash % N) is problematic: it remaps almost all keys when nodes are added or removed
Discuss client-side routing versus proxy-based routing and their latency and complexity tradeoffs
Consider how to handle data migration during rebalancing without causing cache stampedes

2. Cache Invalidation and Coherence

This is famously one of the hardest problems in computer science. Interviewers probe how you keep cached data fresh and handle the various failure modes.

Hints to consider:

Specify the write policy: cache-aside (application manages cache), write-through (cache updates on write), or write-behind (async persistence)
Use TTL with jitter to prevent synchronized expiration storms (thundering herd)
Implement request coalescing (single-flight pattern) so only one backend fetch occurs per cache miss even under high concurrency
Use a pub/sub channel to broadcast invalidation events across nodes and regions

3. Replication and Failover

Single-instance shards create single points of failure. Interviewers expect a concrete plan for replicas, health checks, and promotion.

Hints to consider:

Deploy primary-replica pairs for each shard with asynchronous replication for read scaling
Implement health checks and automatic leader election (via a coordination service) when the primary fails
Discuss the read-after-write consistency challenge: a write to the primary may not be visible on replicas immediately
Consider how replicas catch up after failover and handle potential data loss from async replication lag

4. Hot Key Mitigation

A small number of extremely popular keys can overwhelm a single shard. Interviewers assess your understanding of this common production issue.

Hints to consider:

Detect hot keys via request rate monitoring and replicate them to multiple shards for load distribution
Implement local (in-process) caching on application servers with short TTLs for the hottest keys
Use negative caching to prevent repeated backend fetches for keys that do not exist
Apply rate limiting or circuit breaking to protect the backend during cache miss storms

Practice/Oracle/Design a Distributed Cache System

Design a Distributed Cache System

System DesignMust

Problem Statement

Key Requirements

Functional

Get, set, delete -- users can set, get, and delete key-value entries with optional per-key TTLs
Low-latency reads -- users can read data with consistently low latency from the nearest available cache node or region
Eviction policies -- users can configure eviction behavior (LRU or LFU) to stay within memory limits
Invalidation -- users can invalidate cached data precisely (single key) or broadly (namespace/tag) when underlying data changes

Non-Functional

Scalability -- support millions of keys per node, hundreds of nodes across regions, and billions of total cached entries
Reliability -- tolerate individual node failures and network partitions with automatic failover and minimal data loss
Latency -- sub-millisecond reads for local cache hits; single-digit millisecond reads for cross-node lookups
Consistency -- eventual consistency across regions is acceptable; support configurable write policies (write-through, write-behind, cache-aside)

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Sharding and Data Distribution

Interviewers want to see how you partition data across cache nodes to balance load and handle node additions and removals gracefully.

Hints to consider:

Use consistent hashing with virtual nodes to distribute keys evenly and minimize redistribution on topology changes
Explain why modulo-based sharding (hash % N) is problematic: it remaps almost all keys when nodes are added or removed
Discuss client-side routing versus proxy-based routing and their latency and complexity tradeoffs
Consider how to handle data migration during rebalancing without causing cache stampedes

2. Cache Invalidation and Coherence

This is famously one of the hardest problems in computer science. Interviewers probe how you keep cached data fresh and handle the various failure modes.

Hints to consider:

Specify the write policy: cache-aside (application manages cache), write-through (cache updates on write), or write-behind (async persistence)
Use TTL with jitter to prevent synchronized expiration storms (thundering herd)
Implement request coalescing (single-flight pattern) so only one backend fetch occurs per cache miss even under high concurrency
Use a pub/sub channel to broadcast invalidation events across nodes and regions

3. Replication and Failover

Single-instance shards create single points of failure. Interviewers expect a concrete plan for replicas, health checks, and promotion.

Hints to consider:

Deploy primary-replica pairs for each shard with asynchronous replication for read scaling
Implement health checks and automatic leader election (via a coordination service) when the primary fails
Discuss the read-after-write consistency challenge: a write to the primary may not be visible on replicas immediately
Consider how replicas catch up after failover and handle potential data loss from async replication lag

4. Hot Key Mitigation

A small number of extremely popular keys can overwhelm a single shard. Interviewers assess your understanding of this common production issue.

Hints to consider:

Detect hot keys via request rate monitoring and replicate them to multiple shards for load distribution
Implement local (in-process) caching on application servers with short TTLs for the hottest keys
Use negative caching to prevent repeated backend fetches for keys that do not exist
Apply rate limiting or circuit breaking to protect the backend during cache miss storms