Practice/Databricks/Durable Log Writer with High Concurrency

Durable Log Writer with High Concurrency

CodingMust

Problem Overview

You are building a high-performance logging library for a distributed data processing system. Thousands of threads write log entries concurrently, and every write must be durably persisted to disk before the method returns. This is critical for data integrity in crash scenarios.

The challenge: fsync() is expensive (1-10ms), but you need both high throughput (tens of thousands of writes/sec) and strong durability guarantees. How do you achieve both?

The Problem

Without optimization:

` Thread 1: write → fsync (5ms) Thread 2: write → fsync (5ms) Thread 3: write → fsync (5ms)

Result: 3 writes in 15ms = 200 writes/sec `

With group commit optimization:

` Thread 1: write ─┐ Thread 2: write ─┼─→ batch → single fsync (5ms) Thread 3: write ─┘

Result: 3 writes in 5ms = 600 writes/sec With 100 threads: 20,000 writes/sec (100x improvement!) `

Requirements

Durability Guarantee: push() must block until data is persisted (fsync completed)
Thread-Level Ordering: Data from same thread must appear in order in the log
High Throughput: Maximize total writes/sec across all threads
Low Latency: Minimize time each thread spends blocked

Architecture

┌──────────┐ │ Thread 1 │──┐ ├──────────┤ │ │ Thread 2 │──┼──→ [Queue] ──→ [Writer Thread] ──→ Batch ──→ fsync() ──→ Disk ├──────────┤ │ ↓ │ Thread 3 │──┘ All threads wait └──────────┘ (CountDownLatch)