Design an in-memory key-value store like Redis that supports fast data operations with key expiration, starting with a single-node implementation and then expanding to a distributed system.
Redis is an in-memory key-value data store used for ultra-fast reads/writes, atomic counters, and caching with key expiration. Think of it as a blazingly fast dictionary that also supports features like time-to-live (TTL), simple transactions, and replication. Many large-scale systems rely on Redis-style primitives to keep latency low and throughput high.
Interviewers ask you to design Redis to test your ability to make disciplined data structure choices, reason about thread safety and contention, and balance latency against durability and memory constraints. After the single-node design, they typically push into sharding, replication, and failover to see if you can evolve a clean, high-performance core into a fault-tolerant distributed system.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The foundational challenge is choosing a concurrency model that delivers sub-millisecond latency without sacrificing throughput. Interviewers want to see whether you understand why Redis chose a single-threaded event loop and the tradeoffs involved.
Hints to consider:
Naive expiration approaches like full keyspace scans or a single global min-heap create latency spikes. Interviewers expect a combined strategy that keeps P99 stable under load.
Hints to consider:
An in-memory store risks total data loss on crash. Interviewers expect you to discuss persistence strategies and their impact on the hot path.
Hints to consider:
Scaling beyond a single node requires partitioning the keyspace and replicating data for fault tolerance. Interviewers probe the details of slot mapping, rebalancing, and leader election.
Hints to consider:
Start by confirming the scope. Ask whether the system needs to support only simple key-value operations or also complex data types like lists, sets, and sorted sets. Clarify the expected data size per key, total keyspace size, and read/write ratio. Determine durability requirements: is some data loss acceptable on crash, or must every write be persisted? Ask about multi-tenancy, access control, and whether clients need pub/sub or transaction support. Confirm whether the distributed design should prioritize availability or consistency during network partitions.
Sketch a single-node architecture first: a network layer using epoll/kqueue for multiplexed I/O, a command parser, a single-threaded event loop that processes commands against an in-memory hash table, and background threads for persistence (RDB snapshots, AOF rewriting). Then expand to a cluster: show multiple nodes each owning a subset of hash slots, with clients using a slot-aware routing library. Add replica nodes that follow their primaries via replication streams. Include a cluster bus for gossip-based failure detection and slot metadata propagation. Show a ZooKeeper or Raft-based coordinator for authoritative slot assignments and failover orchestration.
Walk through the critical path of a SET command with TTL. The client sends the command to the correct node (determined by hashing the key to a slot). The event loop reads the command from the socket buffer, parses it, inserts or updates the key in the hash table, and records the TTL in a separate expiry dictionary keyed by absolute timestamp. The command is appended to the AOF buffer and acknowledged to the client. For expiration, explain the dual strategy: on every key access, check the expiry dictionary and return a miss if expired (lazy); every 100ms, sample 20 random keys from the expiry dictionary and delete those that have expired, repeating the cycle if more than 25% were expired (active). Discuss how this bounded, probabilistic cleanup keeps memory usage reasonable without causing latency spikes.
Cover replication: the primary streams its AOF to replicas, which replay commands to maintain an eventually consistent copy. Discuss replication lag and how clients can route reads to replicas for scale while accepting slightly stale data. Address memory management: set a maxmemory limit with eviction policies (LRU, LFU, random, volatile-ttl) to handle memory pressure gracefully. Explain monitoring: expose metrics for memory usage, command latency histograms, replication lag, and keyspace hit/miss ratios. Touch on security: require authentication, support TLS for in-transit encryption, and implement ACLs for multi-tenant access control. Finally, discuss client-side caching with server-assisted invalidation to reduce read load on the cluster.
Deepen your understanding of the patterns used in this problem: