Practice/Monzo/Design a Top-K System
Design a Top-K System
System DesignMust
Problem Statement
Design a distributed system that computes and serves real-time top-K rankings of items (songs, videos, hashtags, products) based on engagement metrics within configurable time windows. For example, the system might answer queries like "top 50 trending hashtags in the last hour" or "top 10 most viewed videos this week," serving results with sub-100ms latency to millions of concurrent users.
The core difficulty lies in ingesting millions of scoring events per second, many of which arrive out of order or in bursts when content goes viral, and continuously maintaining accurate ranked lists across multiple time windows and audience segments. You must separate the high-throughput write path from the low-latency read path, handle extreme skew on popular items, and make principled tradeoffs between freshness, cost, and correctness.
Key Requirements
Functional
- Time-windowed rankings -- support multiple fixed windows (last hour, last 24 hours, last 7 days, last 30 days, all-time) with each window updating independently
- Segmented leaderboards -- partition rankings by dimensions such as geography, category, or user cohort so that different audiences see relevant results
- Configurable result size -- allow clients to request top-K where K ranges from 10 to 1000, with stable and deterministic ordering when scores tie
- Paginated access -- enable page-through for large result sets with consistent snapshots across paginated requests
Non-Functional
- Scalability -- handle 10 million or more events per second during peak hours, with independent scaling of write throughput and read capacity
- Reliability -- tolerate datacenter failures and node crashes without losing more than a few minutes of leaderboard history
- Latency -- p99 read latency under 100ms for top-K queries, with a write-to-visibility delay under 10 seconds for non-viral content
- Consistency -- eventual consistency is acceptable; rankings should converge within seconds, and duplicate or late events should be handled gracefully
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Write Path and Stream Aggregation
Interviewers want to see how you process millions of incoming events without creating database hotspots or losing data. They test whether you understand stream processing, event deduplication, and the distinction between raw events and aggregated counts.
Hints to consider:
- Use an event streaming platform (Kafka or Kinesis) as a durable buffer between producers and consumers so that ingestion is decoupled from processing
- Partition events by item ID or a composite key (window plus segment plus item ID) to enable parallel aggregation while preserving per-item ordering
- Apply tumbling or sliding windows in a stream processor (Flink or Spark Streaming) to maintain per-window counters, using watermarks to handle late-arriving events
- Batch writes to the storage layer (flush every few seconds or every N events) to reduce write amplification and smooth throughput
2. Hot Key Mitigation and Skew
Viral content creates extreme skew where a single item receives orders of magnitude more events than the average. Interviewers check whether you recognize this problem and can propose solutions beyond naive sharding.
Hints to consider:
- Implement split counters where hot keys are sharded into N sub-counters (for example item_123_shard_0 through item_123_shard_7) and merged on a periodic schedule
- Evaluate probabilistic data structures (HyperLogLog for unique counts, Count-Min Sketch for frequency estimation) when exact precision is not required
- Apply skew-aware routing where the ingestion layer detects hot items in real time and distributes their events across additional partitions dynamically
- Separate the rarely changing "top 10" from the constantly churning long tail, using tiered storage with different update cadences for each tier
3. Serving Pre-Computed Top-K Results
Computing top-K by scanning all items on every query is an anti-pattern that interviewers penalize. They look for pre-computation strategies, appropriate data structures, and cache-friendly designs.
Hints to consider:
- Pre-compute top-K lists for each window and segment combination using the stream processor, emitting ranked results to the serving layer on every window close or periodic tick
- Store pre-computed rankings in Redis sorted sets (ZADD with scores) or a key-value store with TTL matching the window duration
- Use min-heaps of size K during the aggregation phase to maintain top candidates in O(N log K) time rather than O(N log N) for a full sort
- Implement tiered caching: application-level in-memory cache (30-second TTL) in front of Redis (10-second freshness) in front of batch recomputation as a fallback
4. Time Window Management and Expiration
Different time windows have different update frequencies and expiration logic. Interviewers probe whether you understand sliding versus tumbling windows, retention policies, and how to balance freshness with infrastructure cost.
Hints to consider:
- Use tumbling windows for fixed-duration leaderboards (daily rankings that reset at midnight) and sliding windows for rolling leaderboards ("last 24 hours" continuously updating)
- Implement window expiration through TTLs on cache keys (Redis EXPIRE) or partition-level deletion in time-series storage
- Trade freshness for cost: refresh the "last hour" window every 30 seconds but the "last 30 days" window only every 5 minutes, since it changes more slowly
- Handle late-arriving events with allowed lateness in the stream processor (for example, accept events up to one hour late before finalizing window results)
5. Read Path and API Design
The read path must serve thousands of queries per second under tight latency constraints. Interviewers evaluate your understanding of caching, pagination, and read-write separation.
Hints to consider:
- Design the API with explicit freshness and pagination parameters so that clients get deterministic, cacheable results
- Return opaque pagination tokens that encode window boundaries and position to guarantee consistent page-through even as rankings shift
- Implement negative caching for nonexistent segments to avoid repeated expensive lookups
- Serve reads from materialized views or replicas that are fully separated from the write aggregation pipeline to isolate query load from ingestion