Practice/LinkedIn/Design a Database like Bitcask
Design a Database like Bitcask
System DesignOptional
Problem Statement
Design a log-structured key-value storage engine similar to Bitcask that supports fast writes, efficient reads, and crash recovery with minimal disk seeks. The system uses append-only writes, an in-memory index for O(1) lookups, and background compaction to reclaim space from obsolete entries.
The challenge is reasoning about storage engine fundamentals: I/O patterns (sequential vs random), write-ahead logging, crash recovery, on-disk formats, compaction/GC, and the trade-offs between latency, throughput, memory footprint, and data integrity.
Key Requirements
Functional
- Fast writes -- users write or update a key with a value and receive an acknowledgment with low latency
- Predictable reads -- users read the latest value for a key with consistent performance even as data grows
- Delete support -- users delete a key and subsequent reads reflect the deletion
- Crash recovery -- acknowledged writes survive crashes and the store recovers quickly to a consistent state
Non-Functional
- Scalability -- handle millions of keys with values of varying sizes; throughput scales with disk I/O bandwidth
- Reliability -- no data loss for acknowledged writes; recovery completes in seconds even with large datasets
- Latency -- sub-millisecond writes (sequential append); sub-millisecond reads for small values (single seek via index)
- Efficiency -- minimize disk space waste through background compaction; bound in-memory index size
Interview Reports from Hello Interview
1 report from a candidate. Most recently asked at LinkedIn in Early December 2025.
This question is primarily asked at LinkedIn.
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Append-Only Write Path
The core of Bitcask is sequential, append-only writes that maximize disk throughput. Interviewers expect a clear on-disk format.
Hints to consider:
- Each record in the data file contains: CRC checksum, timestamp, key size, value size, key bytes, value bytes
- Append all writes (including updates and deletes) to the active data file sequentially
- Use group commit (batch multiple writes before calling fsync) to amortize disk sync cost
- Rotate to a new active file when the current one exceeds a size threshold (e.g., 64MB)
2. In-Memory Index (Keydir)
Fast reads depend on an in-memory hash map that maps every key to its on-disk location. Interviewers want to understand the memory implications.
Hints to consider:
- The keydir maps each key to (file_id, value_offset, value_size, timestamp)
- Reads are O(1): look up the key in the keydir, seek to the offset in the correct file, read the value
- Memory usage is proportional to the number of unique keys (not total data size)
- For datasets where keys don't fit in memory, consider a hybrid approach with an on-disk index or range partitioning
3. Compaction and Merge Process
Without compaction, disk space grows forever as old values are never reclaimed. Interviewers want a concrete algorithm.
Hints to consider:
- Select inactive (non-active) data files for compaction based on the ratio of live vs dead entries
- Read selected files sequentially, check each entry against the keydir (is this the latest version?), and write live entries to a new compacted file
- Generate a hint file alongside each compacted file containing only key-to-offset mappings for fast recovery
- Swap compacted files atomically (rename) and delete old files; update the keydir to point to new locations
4. Crash Recovery
The system must restart quickly and recover to a consistent state without losing acknowledged writes.
Hints to consider:
- On startup, replay hint files (if available) to rebuild the keydir without reading full data files
- If hint files are missing, scan data files from oldest to newest, building the keydir by keeping the latest entry per key
- Validate CRC checksums during recovery to detect and skip corrupted entries
- Keep recovery time bounded by limiting data file sizes and maintaining up-to-date hint files after compaction
5. Tombstone Handling for Deletes
Deletes must be handled correctly in a log-structured system where old entries are not overwritten.
Hints to consider:
- Write a special tombstone record (key with empty value or a delete flag) on delete
- The keydir removes the key entry so reads return "not found" immediately
- Compaction filters out tombstones (and the values they obsolete) from older files
- Retain tombstones for a configurable period to ensure they propagate through compaction of all files