Practice/Oracle/Design a Distributed Cache System
Design a Distributed Cache System
System DesignMust
Problem Statement
Design a distributed caching system that can serve data with low latency across multiple geographic locations, handling cache misses, replication, and scaling to support billions of records. Think of Redis or Memcached deployed as a fleet across regions.
A distributed cache sits in front of your databases and APIs to serve data with microseconds-to-milliseconds latency. Users read and write simple records by key, with optional time-to-live (TTL), and the cache offloads the backend while keeping latency predictable. Interviewers ask this because it bundles core distributed systems skills: sharding and client routing, consistency versus freshness, replication and failover, multi-region design, hot-key contention, and the realities of cache invalidation.
Key Requirements
Functional
- Get, set, delete -- users can set, get, and delete key-value entries with optional per-key TTLs
- Low-latency reads -- users can read data with consistently low latency from the nearest available cache node or region
- Eviction policies -- users can configure eviction behavior (LRU or LFU) to stay within memory limits
- Invalidation -- users can invalidate cached data precisely (single key) or broadly (namespace/tag) when underlying data changes
Non-Functional
- Scalability -- support millions of keys per node, hundreds of nodes across regions, and billions of total cached entries
- Reliability -- tolerate individual node failures and network partitions with automatic failover and minimal data loss
- Latency -- sub-millisecond reads for local cache hits; single-digit millisecond reads for cross-node lookups
- Consistency -- eventual consistency across regions is acceptable; support configurable write policies (write-through, write-behind, cache-aside)
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Sharding and Data Distribution
Interviewers want to see how you partition data across cache nodes to balance load and handle node additions and removals gracefully.
Hints to consider:
- Use consistent hashing with virtual nodes to distribute keys evenly and minimize redistribution on topology changes
- Explain why modulo-based sharding (hash % N) is problematic: it remaps almost all keys when nodes are added or removed
- Discuss client-side routing versus proxy-based routing and their latency and complexity tradeoffs
- Consider how to handle data migration during rebalancing without causing cache stampedes
2. Cache Invalidation and Coherence
This is famously one of the hardest problems in computer science. Interviewers probe how you keep cached data fresh and handle the various failure modes.
Hints to consider:
- Specify the write policy: cache-aside (application manages cache), write-through (cache updates on write), or write-behind (async persistence)
- Use TTL with jitter to prevent synchronized expiration storms (thundering herd)
- Implement request coalescing (single-flight pattern) so only one backend fetch occurs per cache miss even under high concurrency
- Use a pub/sub channel to broadcast invalidation events across nodes and regions
3. Replication and Failover
Single-instance shards create single points of failure. Interviewers expect a concrete plan for replicas, health checks, and promotion.
Hints to consider:
- Deploy primary-replica pairs for each shard with asynchronous replication for read scaling
- Implement health checks and automatic leader election (via a coordination service) when the primary fails
- Discuss the read-after-write consistency challenge: a write to the primary may not be visible on replicas immediately
- Consider how replicas catch up after failover and handle potential data loss from async replication lag
4. Hot Key Mitigation
A small number of extremely popular keys can overwhelm a single shard. Interviewers assess your understanding of this common production issue.
Hints to consider:
- Detect hot keys via request rate monitoring and replicate them to multiple shards for load distribution
- Implement local (in-process) caching on application servers with short TTLs for the hottest keys
- Use negative caching to prevent repeated backend fetches for keys that do not exist
- Apply rate limiting or circuit breaking to protect the backend during cache miss storms