Practice/Databricks/Durable Log Writer with High Concurrency
CodingMust
You are building a high-performance logging library for a distributed data processing system. Thousands of threads write log entries concurrently, and every write must be durably persisted to disk before the method returns. This is critical for data integrity in crash scenarios.
The challenge: fsync() is expensive (1-10ms), but you need both high throughput (tens of thousands of writes/sec) and strong durability guarantees. How do you achieve both?
Without optimization:
` Thread 1: write → fsync (5ms) Thread 2: write → fsync (5ms) Thread 3: write → fsync (5ms)
Result: 3 writes in 15ms = 200 writes/sec `
With group commit optimization:
` Thread 1: write ─┐ Thread 2: write ─┼─→ batch → single fsync (5ms) Thread 3: write ─┘
Result: 3 writes in 5ms = 600 writes/sec With 100 threads: 20,000 writes/sec (100x improvement!) `
push() must block until data is persisted (fsync completed)┌──────────┐ │ Thread 1 │──┐ ├──────────┤ │ │ Thread 2 │──┼──→ [Queue] ──→ [Writer Thread] ──→ Batch ──→ fsync() ──→ Disk ├──────────┤ │ ↓ │ Thread 3 │──┘ All threads wait └──────────┘ (CountDownLatch)