Practice/Uber/Design a Key-Value Store

Design a Key-Value Store

System DesignMust

Problem Statement

Design a distributed key-value store that supports persistent storage with append-only file operations, handles both read-heavy and write-heavy workloads with strong consistency, and can manage large values (up to 2 GB) that cannot fit entirely in memory. Think of systems like DynamoDB, Cassandra, or a custom-built storage engine.

The core engineering challenges are designing a log-structured storage engine with compaction, implementing replication and consensus for strong consistency, supporting streaming I/O for large values, and balancing read and write performance. You must reason about append-only writes, LSM-tree or B-tree tradeoffs, quorum-based replication, and how to handle values that exceed available memory.

Interviewers at Uber and LinkedIn ask this to test whether you can design a storage system from first principles, covering the write path (append-only log, WAL), read path (indexes, caches, Bloom filters), replication for durability and consistency, and special handling for large binary values.

Key Requirements

Functional

Put/update -- users store or update values up to 2 GB per key, uploading and downloading via streaming without loading the entire value into memory
Get -- users retrieve the latest value of a key with strong read-after-write consistency from any region
Delete -- users delete keys with the deletion reflected immediately with strong consistency
Conditional operations -- users perform compare-and-set using a version or ETag to coordinate concurrent writers

Non-Functional

Scalability -- support millions of keys across hundreds of nodes with linear horizontal scaling
Reliability -- tolerate node and rack failures without data loss through replication
Latency -- sub-10 ms reads for small values from cache; sub-100 ms for on-disk reads; streaming throughput for large values
Consistency -- strong read-after-write consistency via quorum or leader-based reads

What Interviewers Focus On

Based on real interview experiences at Uber, LinkedIn, and Databricks, these are the areas interviewers probe most deeply:

1. Write Path and Storage Engine Design

The append-only constraint shapes the entire storage architecture. Interviewers want to see a log-structured approach with compaction, not in-place updates.

Hints to consider:

Use a write-ahead log (WAL) for durability: every write is first appended to a sequential log before updating in-memory structures
Implement an LSM-tree-inspired design: writes go to an in-memory memtable (sorted), which flushes to immutable on-disk SSTables when full
Design a compaction strategy (size-tiered or leveled) to merge SSTables and remove tombstones for deleted keys
Use sequential I/O throughout for high write throughput; avoid random writes

2. Read Path and Indexing

Reads must be fast despite data being spread across multiple SSTables. Interviewers evaluate your indexing and caching strategy.

Hints to consider:

Check the in-memory memtable first, then SSTables from newest to oldest
Use Bloom filters per SSTable to quickly determine if a key might exist, avoiding unnecessary disk reads
Maintain a sparse index (one entry per data block) to minimize the number of disk seeks per read
Cache hot keys and their metadata in memory; for small values, cache the value itself

3. Handling Large Values (up to 2 GB)

Values that exceed memory require special treatment throughout the system. This is where many candidates struggle.

Hints to consider:

Separate large values from the LSM-tree: store them in a separate value log with only a pointer in the main index
Implement chunked streaming I/O so neither client nor server needs to buffer the entire value in memory
Use per-chunk checksums to detect corruption during transfer and storage
Design compaction to handle value-log garbage collection independently from the key index

4. Replication and Strong Consistency

Claiming strong consistency without describing the mechanism is a red flag. Interviewers expect a concrete replication strategy.

Hints to consider:

Use leader-based replication where writes go to a partition leader that replicates to followers before acknowledging
Implement quorum writes (W > N/2) and quorum reads (R > N/2) where W + R > N ensures overlap
Use a coordination service (ZooKeeper, etcd) for leader election and epoch/fencing to prevent split-brain
Handle leader failover: new leader must have all committed writes; use in-sync replica (ISR) tracking

Practice/Uber/Design a Key-Value Store

Design a Key-Value Store

System DesignMust

Problem Statement

Key Requirements

Functional

Put/update -- users store or update values up to 2 GB per key, uploading and downloading via streaming without loading the entire value into memory
Get -- users retrieve the latest value of a key with strong read-after-write consistency from any region
Delete -- users delete keys with the deletion reflected immediately with strong consistency
Conditional operations -- users perform compare-and-set using a version or ETag to coordinate concurrent writers

Non-Functional

Scalability -- support millions of keys across hundreds of nodes with linear horizontal scaling
Reliability -- tolerate node and rack failures without data loss through replication
Latency -- sub-10 ms reads for small values from cache; sub-100 ms for on-disk reads; streaming throughput for large values
Consistency -- strong read-after-write consistency via quorum or leader-based reads

What Interviewers Focus On

Based on real interview experiences at Uber, LinkedIn, and Databricks, these are the areas interviewers probe most deeply:

1. Write Path and Storage Engine Design

The append-only constraint shapes the entire storage architecture. Interviewers want to see a log-structured approach with compaction, not in-place updates.

Hints to consider:

Use a write-ahead log (WAL) for durability: every write is first appended to a sequential log before updating in-memory structures
Implement an LSM-tree-inspired design: writes go to an in-memory memtable (sorted), which flushes to immutable on-disk SSTables when full
Design a compaction strategy (size-tiered or leveled) to merge SSTables and remove tombstones for deleted keys
Use sequential I/O throughout for high write throughput; avoid random writes

2. Read Path and Indexing

Reads must be fast despite data being spread across multiple SSTables. Interviewers evaluate your indexing and caching strategy.

Hints to consider:

Check the in-memory memtable first, then SSTables from newest to oldest
Use Bloom filters per SSTable to quickly determine if a key might exist, avoiding unnecessary disk reads
Maintain a sparse index (one entry per data block) to minimize the number of disk seeks per read
Cache hot keys and their metadata in memory; for small values, cache the value itself

3. Handling Large Values (up to 2 GB)

Values that exceed memory require special treatment throughout the system. This is where many candidates struggle.

Hints to consider:

Separate large values from the LSM-tree: store them in a separate value log with only a pointer in the main index
Implement chunked streaming I/O so neither client nor server needs to buffer the entire value in memory
Use per-chunk checksums to detect corruption during transfer and storage
Design compaction to handle value-log garbage collection independently from the key index

4. Replication and Strong Consistency

Claiming strong consistency without describing the mechanism is a red flag. Interviewers expect a concrete replication strategy.

Hints to consider:

Use leader-based replication where writes go to a partition leader that replicates to followers before acknowledging
Implement quorum writes (W > N/2) and quorum reads (R > N/2) where W + R > N ensures overlap
Use a coordination service (ZooKeeper, etcd) for leader election and epoch/fencing to prevent split-brain
Handle leader failover: new leader must have all committed writes; use in-sync replica (ISR) tracking