Practice/LinkedIn/Design a Key-Value Store

Design a Key-Value Store

System DesignMust

Problem Statement

Design a distributed key-value store that supports persistent storage with append-only file operations, handles both read-heavy and write-heavy workloads with strong consistency, and can manage large values (up to 2GB) that cannot fit entirely in memory.

The system must handle append-only writes, background compaction, replicated commit paths, sparse indexes, and Bloom filters. You need to reason about I/O patterns, write-ahead logging, crash recovery, compaction, and the trade-offs between latency, throughput, memory footprint, and data integrity.

Key Requirements

Functional

Streaming put/get -- users put or update values up to 2GB per key, uploading and downloading via streaming without loading the entire value into memory
Strong read-after-write -- users read the latest value of a key with strong consistency from any region
Delete support -- users delete keys with the deletion reflected immediately and consistently
Conditional operations -- users perform compare-and-set using versions or ETags to coordinate concurrent writers

Non-Functional

Scalability -- handle millions of keys with values ranging from bytes to gigabytes; horizontal scaling via sharding
Reliability -- acknowledged writes survive crashes; the store recovers quickly to a consistent state
Latency -- sub-10ms reads for small values; large value reads limited by I/O bandwidth rather than lookup overhead
Consistency -- strong read-after-write consistency; linearizable operations for conditional writes

Interview Reports from Hello Interview

19 reports from candidates. Most recently asked at LinkedIn in Early February 2026.

Also commonly asked at: Databricks, Uber, Airbnb.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Append-Only Write Path and Log-Structured Storage

Interviewers want to see how you achieve fast writes using sequential I/O rather than random writes.

Hints to consider:

Append all writes to a segment file sequentially, including the key, value, timestamp, and CRC checksum
Maintain an in-memory index (keydir) that maps each key to (file_id, offset, size) for O(1) lookups
Rotate segments when they reach a configurable size threshold (e.g., 64MB) and start a new active segment
Use group commit (batch multiple writes before fsync) to amortize disk sync overhead across requests

2. Compaction and Garbage Collection

Without compaction, disk usage grows unboundedly as old values are never reclaimed. Interviewers expect a concrete GC strategy.

Hints to consider:

Run background compaction that reads old segments, keeps only the latest version of each key, and writes compacted segments
Use tombstone records for deletes; remove tombstones during compaction after they've been propagated to all replicas
Swap compacted segments atomically (rename) to ensure readers always see a consistent view
Implement backpressure on writes if compaction falls behind to prevent disk exhaustion

3. Large Value Handling and Streaming I/O

Values up to 2GB cannot fit in memory. Interviewers want to see chunking and streaming strategies.

Hints to consider:

Split large values into fixed-size chunks (e.g., 4MB) stored sequentially in the segment file
Stream chunks directly from disk to the network socket without buffering the entire value in memory
Store per-chunk CRC checksums and verify integrity during reads
Separate metadata (key, total size, chunk offsets) from value data to enable fast lookups without reading the full value

4. Replication and Consistency

Strong consistency requires a clear replication protocol with leader election and quorum operations.

Hints to consider:

Use a single-leader replication model where all writes go through the leader, which replicates to followers via the append-only log
Require write acknowledgment from a quorum (majority) of replicas before confirming to the client
Implement leader election using a coordination service (ZooKeeper/etcd) with fencing tokens to prevent split-brain
For reads, serve from the leader or use lease-based follower reads where the follower's lease guarantees it has the latest data

Practice/LinkedIn/Design a Key-Value Store

Design a Key-Value Store

System DesignMust

Problem Statement

Key Requirements

Functional

Streaming put/get -- users put or update values up to 2GB per key, uploading and downloading via streaming without loading the entire value into memory
Strong read-after-write -- users read the latest value of a key with strong consistency from any region
Delete support -- users delete keys with the deletion reflected immediately and consistently
Conditional operations -- users perform compare-and-set using versions or ETags to coordinate concurrent writers

Non-Functional

Scalability -- handle millions of keys with values ranging from bytes to gigabytes; horizontal scaling via sharding
Reliability -- acknowledged writes survive crashes; the store recovers quickly to a consistent state
Latency -- sub-10ms reads for small values; large value reads limited by I/O bandwidth rather than lookup overhead
Consistency -- strong read-after-write consistency; linearizable operations for conditional writes

Interview Reports from Hello Interview

19 reports from candidates. Most recently asked at LinkedIn in Early February 2026.

Also commonly asked at: Databricks, Uber, Airbnb.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Append-Only Write Path and Log-Structured Storage

Interviewers want to see how you achieve fast writes using sequential I/O rather than random writes.

Hints to consider:

Append all writes to a segment file sequentially, including the key, value, timestamp, and CRC checksum
Maintain an in-memory index (keydir) that maps each key to (file_id, offset, size) for O(1) lookups
Rotate segments when they reach a configurable size threshold (e.g., 64MB) and start a new active segment
Use group commit (batch multiple writes before fsync) to amortize disk sync overhead across requests

2. Compaction and Garbage Collection

Without compaction, disk usage grows unboundedly as old values are never reclaimed. Interviewers expect a concrete GC strategy.

Hints to consider:

Run background compaction that reads old segments, keeps only the latest version of each key, and writes compacted segments
Use tombstone records for deletes; remove tombstones during compaction after they've been propagated to all replicas
Swap compacted segments atomically (rename) to ensure readers always see a consistent view
Implement backpressure on writes if compaction falls behind to prevent disk exhaustion

3. Large Value Handling and Streaming I/O

Values up to 2GB cannot fit in memory. Interviewers want to see chunking and streaming strategies.

Hints to consider:

Split large values into fixed-size chunks (e.g., 4MB) stored sequentially in the segment file
Stream chunks directly from disk to the network socket without buffering the entire value in memory
Store per-chunk CRC checksums and verify integrity during reads
Separate metadata (key, total size, chunk offsets) from value data to enable fast lookups without reading the full value

4. Replication and Consistency

Strong consistency requires a clear replication protocol with leader election and quorum operations.

Hints to consider:

Use a single-leader replication model where all writes go through the leader, which replicates to followers via the append-only log
Require write acknowledgment from a quorum (majority) of replicas before confirming to the client
Implement leader election using a coordination service (ZooKeeper/etcd) with fencing tokens to prevent split-brain
For reads, serve from the leader or use lease-based follower reads where the follower's lease guarantees it has the latest data