Practice/Apple/Design a Logger System
Design a Logger System
System DesignMust
Problem Statement
Design a centralized logging system that collects logs from thousands of application instances, deduplicates repeated messages within a time window, stores them durably, and provides fast search. A twist: a message should only be printed/stored if the same message has not been seen in the last 10 seconds.
This tests your ability to design a high-throughput write pipeline with real-time deduplication, multi-tenant isolation (different apps share the system), and a search layer that handles queries over massive volumes of time-series data. You need to balance ingestion speed, storage cost, and query latency.
Key Requirements
Functional
- Log ingestion with dedup -- accept log messages with timestamps; suppress duplicates that appeared within the last 10 seconds
- Multi-tenant scoping -- logs are isolated by application; each app has its own quotas, access controls, and search namespace
- Search and filter -- users search logs by time range, application, severity level, and free-text keywords with fast results
- Live tail -- users can stream recent logs in near real time for a given application or filter
Non-Functional
- Scalability -- ingest 100K+ log messages per second across all tenants
- Latency -- dedup decision in under 10ms; search results in under 500ms for recent data
- Durability -- no log loss once accepted; at-least-once delivery from producers
- Cost efficiency -- older logs should be stored cheaply; only recent logs need fast access
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Time-Window Deduplication at Scale
The 10-second suppression window means you need to remember recently seen messages and check them on every incoming log — across multiple instances processing in parallel.
Hints to consider:
- Define the dedup key clearly: app_id + normalized message text (strip variable parts like timestamps or request IDs)
- Use Redis with SETNX and a 10-second TTL for atomic check-and-set: if the key exists, suppress; if not, set it and allow
- This must be a shared store (not in-process memory) so all ingestion instances see the same state
- Discuss what happens at scale: hot messages create hot keys; consider sharding by hash of the dedup key
2. High-Throughput Ingestion Pipeline
Logs arrive in massive bursts (deployment, error storm). The ingestion path must absorb spikes without dropping data or blocking producers.
Hints to consider:
- Decouple producers from consumers using a durable buffer (Kafka) between the API and the processing pipeline
- Producers batch messages client-side before sending to reduce network overhead
- Use per-partition ordering in Kafka (keyed by app_id) to maintain per-tenant ordering where needed
- Implement backpressure: if the pipeline falls behind, apply rate limiting at the API gateway rather than dropping messages silently
3. Search Architecture for Log Data
Users need to find specific log entries across billions of records. Full table scans are not viable.
Hints to consider:
- Use a search engine (Elasticsearch) with time-based indices: one index per day or per week, so old indices can be closed/archived
- Shard by app_id or tenant so queries are scoped to a subset of shards
- For recent logs (last hour), keep a hot cache in Redis or use a dedicated Elasticsearch index with more replicas
- Discuss index lifecycle management: hot → warm → cold → frozen → delete, based on age