Practice/Monzo/Design a Top-K System

Design a Top-K System

System DesignMust

Problem Statement

Design a distributed system that computes and serves real-time top-K rankings of items (songs, videos, hashtags, products) based on engagement metrics within configurable time windows. For example, the system might answer queries like "top 50 trending hashtags in the last hour" or "top 10 most viewed videos this week," serving results with sub-100ms latency to millions of concurrent users.

The core difficulty lies in ingesting millions of scoring events per second, many of which arrive out of order or in bursts when content goes viral, and continuously maintaining accurate ranked lists across multiple time windows and audience segments. You must separate the high-throughput write path from the low-latency read path, handle extreme skew on popular items, and make principled tradeoffs between freshness, cost, and correctness.

Key Requirements

Functional

Time-windowed rankings -- support multiple fixed windows (last hour, last 24 hours, last 7 days, last 30 days, all-time) with each window updating independently
Segmented leaderboards -- partition rankings by dimensions such as geography, category, or user cohort so that different audiences see relevant results
Configurable result size -- allow clients to request top-K where K ranges from 10 to 1000, with stable and deterministic ordering when scores tie
Paginated access -- enable page-through for large result sets with consistent snapshots across paginated requests

Non-Functional

Scalability -- handle 10 million or more events per second during peak hours, with independent scaling of write throughput and read capacity
Reliability -- tolerate datacenter failures and node crashes without losing more than a few minutes of leaderboard history
Latency -- p99 read latency under 100ms for top-K queries, with a write-to-visibility delay under 10 seconds for non-viral content
Consistency -- eventual consistency is acceptable; rankings should converge within seconds, and duplicate or late events should be handled gracefully

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Write Path and Stream Aggregation

Interviewers want to see how you process millions of incoming events without creating database hotspots or losing data. They test whether you understand stream processing, event deduplication, and the distinction between raw events and aggregated counts.

Hints to consider:

Use an event streaming platform (Kafka or Kinesis) as a durable buffer between producers and consumers so that ingestion is decoupled from processing
Partition events by item ID or a composite key (window plus segment plus item ID) to enable parallel aggregation while preserving per-item ordering
Apply tumbling or sliding windows in a stream processor (Flink or Spark Streaming) to maintain per-window counters, using watermarks to handle late-arriving events
Batch writes to the storage layer (flush every few seconds or every N events) to reduce write amplification and smooth throughput

2. Hot Key Mitigation and Skew

Viral content creates extreme skew where a single item receives orders of magnitude more events than the average. Interviewers check whether you recognize this problem and can propose solutions beyond naive sharding.

Hints to consider:

Implement split counters where hot keys are sharded into N sub-counters (for example item_123_shard_0 through item_123_shard_7) and merged on a periodic schedule
Evaluate probabilistic data structures (HyperLogLog for unique counts, Count-Min Sketch for frequency estimation) when exact precision is not required
Apply skew-aware routing where the ingestion layer detects hot items in real time and distributes their events across additional partitions dynamically
Separate the rarely changing "top 10" from the constantly churning long tail, using tiered storage with different update cadences for each tier

3. Serving Pre-Computed Top-K Results

Computing top-K by scanning all items on every query is an anti-pattern that interviewers penalize. They look for pre-computation strategies, appropriate data structures, and cache-friendly designs.

Hints to consider:

Pre-compute top-K lists for each window and segment combination using the stream processor, emitting ranked results to the serving layer on every window close or periodic tick
Store pre-computed rankings in Redis sorted sets (ZADD with scores) or a key-value store with TTL matching the window duration
Use min-heaps of size K during the aggregation phase to maintain top candidates in O(N log K) time rather than O(N log N) for a full sort
Implement tiered caching: application-level in-memory cache (30-second TTL) in front of Redis (10-second freshness) in front of batch recomputation as a fallback

4. Time Window Management and Expiration

Different time windows have different update frequencies and expiration logic. Interviewers probe whether you understand sliding versus tumbling windows, retention policies, and how to balance freshness with infrastructure cost.

Hints to consider:

Use tumbling windows for fixed-duration leaderboards (daily rankings that reset at midnight) and sliding windows for rolling leaderboards ("last 24 hours" continuously updating)
Implement window expiration through TTLs on cache keys (Redis EXPIRE) or partition-level deletion in time-series storage
Trade freshness for cost: refresh the "last hour" window every 30 seconds but the "last 30 days" window only every 5 minutes, since it changes more slowly
Handle late-arriving events with allowed lateness in the stream processor (for example, accept events up to one hour late before finalizing window results)

5. Read Path and API Design

The read path must serve thousands of queries per second under tight latency constraints. Interviewers evaluate your understanding of caching, pagination, and read-write separation.

Hints to consider:

Design the API with explicit freshness and pagination parameters so that clients get deterministic, cacheable results
Return opaque pagination tokens that encode window boundaries and position to guarantee consistent page-through even as rankings shift
Implement negative caching for nonexistent segments to avoid repeated expensive lookups
Serve reads from materialized views or replicas that are fully separated from the write aggregation pipeline to isolate query load from ingestion

Practice/Monzo/Design a Top-K System

Design a Top-K System

System DesignMust

Problem Statement

Key Requirements

Functional

Time-windowed rankings -- support multiple fixed windows (last hour, last 24 hours, last 7 days, last 30 days, all-time) with each window updating independently
Segmented leaderboards -- partition rankings by dimensions such as geography, category, or user cohort so that different audiences see relevant results
Configurable result size -- allow clients to request top-K where K ranges from 10 to 1000, with stable and deterministic ordering when scores tie
Paginated access -- enable page-through for large result sets with consistent snapshots across paginated requests

Non-Functional

Scalability -- handle 10 million or more events per second during peak hours, with independent scaling of write throughput and read capacity
Reliability -- tolerate datacenter failures and node crashes without losing more than a few minutes of leaderboard history
Latency -- p99 read latency under 100ms for top-K queries, with a write-to-visibility delay under 10 seconds for non-viral content
Consistency -- eventual consistency is acceptable; rankings should converge within seconds, and duplicate or late events should be handled gracefully

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Write Path and Stream Aggregation

Hints to consider:

Use an event streaming platform (Kafka or Kinesis) as a durable buffer between producers and consumers so that ingestion is decoupled from processing
Partition events by item ID or a composite key (window plus segment plus item ID) to enable parallel aggregation while preserving per-item ordering
Apply tumbling or sliding windows in a stream processor (Flink or Spark Streaming) to maintain per-window counters, using watermarks to handle late-arriving events
Batch writes to the storage layer (flush every few seconds or every N events) to reduce write amplification and smooth throughput

2. Hot Key Mitigation and Skew

Hints to consider:

Implement split counters where hot keys are sharded into N sub-counters (for example item_123_shard_0 through item_123_shard_7) and merged on a periodic schedule
Evaluate probabilistic data structures (HyperLogLog for unique counts, Count-Min Sketch for frequency estimation) when exact precision is not required
Apply skew-aware routing where the ingestion layer detects hot items in real time and distributes their events across additional partitions dynamically
Separate the rarely changing "top 10" from the constantly churning long tail, using tiered storage with different update cadences for each tier

3. Serving Pre-Computed Top-K Results

Computing top-K by scanning all items on every query is an anti-pattern that interviewers penalize. They look for pre-computation strategies, appropriate data structures, and cache-friendly designs.

Hints to consider:

Pre-compute top-K lists for each window and segment combination using the stream processor, emitting ranked results to the serving layer on every window close or periodic tick
Store pre-computed rankings in Redis sorted sets (ZADD with scores) or a key-value store with TTL matching the window duration
Use min-heaps of size K during the aggregation phase to maintain top candidates in O(N log K) time rather than O(N log N) for a full sort
Implement tiered caching: application-level in-memory cache (30-second TTL) in front of Redis (10-second freshness) in front of batch recomputation as a fallback

4. Time Window Management and Expiration

Hints to consider:

Use tumbling windows for fixed-duration leaderboards (daily rankings that reset at midnight) and sliding windows for rolling leaderboards ("last 24 hours" continuously updating)
Implement window expiration through TTLs on cache keys (Redis EXPIRE) or partition-level deletion in time-series storage
Trade freshness for cost: refresh the "last hour" window every 30 seconds but the "last 30 days" window only every 5 minutes, since it changes more slowly
Handle late-arriving events with allowed lateness in the stream processor (for example, accept events up to one hour late before finalizing window results)

5. Read Path and API Design

The read path must serve thousands of queries per second under tight latency constraints. Interviewers evaluate your understanding of caching, pagination, and read-write separation.

Hints to consider:

Design the API with explicit freshness and pagination parameters so that clients get deterministic, cacheable results
Return opaque pagination tokens that encode window boundaries and position to guarantee consistent page-through even as rankings shift
Implement negative caching for nonexistent segments to avoid repeated expensive lookups
Serve reads from materialized views or replicas that are fully separated from the write aggregation pipeline to isolate query load from ingestion