Practice/Uber/Design a Key-Value Store
Design a Key-Value Store
System DesignMust
Problem Statement
Design a distributed key-value store that supports persistent storage with append-only file operations, handles both read-heavy and write-heavy workloads with strong consistency, and can manage large values (up to 2 GB) that cannot fit entirely in memory. Think of systems like DynamoDB, Cassandra, or a custom-built storage engine.
The core engineering challenges are designing a log-structured storage engine with compaction, implementing replication and consensus for strong consistency, supporting streaming I/O for large values, and balancing read and write performance. You must reason about append-only writes, LSM-tree or B-tree tradeoffs, quorum-based replication, and how to handle values that exceed available memory.
Interviewers at Uber and LinkedIn ask this to test whether you can design a storage system from first principles, covering the write path (append-only log, WAL), read path (indexes, caches, Bloom filters), replication for durability and consistency, and special handling for large binary values.
Key Requirements
Functional
- Put/update -- users store or update values up to 2 GB per key, uploading and downloading via streaming without loading the entire value into memory
- Get -- users retrieve the latest value of a key with strong read-after-write consistency from any region
- Delete -- users delete keys with the deletion reflected immediately with strong consistency
- Conditional operations -- users perform compare-and-set using a version or ETag to coordinate concurrent writers
Non-Functional
- Scalability -- support millions of keys across hundreds of nodes with linear horizontal scaling
- Reliability -- tolerate node and rack failures without data loss through replication
- Latency -- sub-10 ms reads for small values from cache; sub-100 ms for on-disk reads; streaming throughput for large values
- Consistency -- strong read-after-write consistency via quorum or leader-based reads
What Interviewers Focus On
Based on real interview experiences at Uber, LinkedIn, and Databricks, these are the areas interviewers probe most deeply:
1. Write Path and Storage Engine Design
The append-only constraint shapes the entire storage architecture. Interviewers want to see a log-structured approach with compaction, not in-place updates.
Hints to consider:
- Use a write-ahead log (WAL) for durability: every write is first appended to a sequential log before updating in-memory structures
- Implement an LSM-tree-inspired design: writes go to an in-memory memtable (sorted), which flushes to immutable on-disk SSTables when full
- Design a compaction strategy (size-tiered or leveled) to merge SSTables and remove tombstones for deleted keys
- Use sequential I/O throughout for high write throughput; avoid random writes
2. Read Path and Indexing
Reads must be fast despite data being spread across multiple SSTables. Interviewers evaluate your indexing and caching strategy.
Hints to consider:
- Check the in-memory memtable first, then SSTables from newest to oldest
- Use Bloom filters per SSTable to quickly determine if a key might exist, avoiding unnecessary disk reads
- Maintain a sparse index (one entry per data block) to minimize the number of disk seeks per read
- Cache hot keys and their metadata in memory; for small values, cache the value itself
3. Handling Large Values (up to 2 GB)
Values that exceed memory require special treatment throughout the system. This is where many candidates struggle.
Hints to consider:
- Separate large values from the LSM-tree: store them in a separate value log with only a pointer in the main index
- Implement chunked streaming I/O so neither client nor server needs to buffer the entire value in memory
- Use per-chunk checksums to detect corruption during transfer and storage
- Design compaction to handle value-log garbage collection independently from the key index
4. Replication and Strong Consistency
Claiming strong consistency without describing the mechanism is a red flag. Interviewers expect a concrete replication strategy.
Hints to consider:
- Use leader-based replication where writes go to a partition leader that replicates to followers before acknowledging
- Implement quorum writes (W > N/2) and quorum reads (R > N/2) where W + R > N ensures overlap
- Use a coordination service (ZooKeeper, etcd) for leader election and epoch/fencing to prevent split-brain
- Handle leader failover: new leader must have all committed writes; use in-sync replica (ISR) tracking