Practice/Amazon/Design a Top-K System

Design a Top-K System

System DesignMust

Problem Statement

Design a system that efficiently retrieves the top-K items (songs, videos, hashtags, products, etc.) based on user activity or engagement metrics within specified time windows. The system should handle real-time data aggregation and support queries like "top 10 songs in the last 7 days" or "trending hashtags in the past 24 hours," with results segmented by dimensions such as geography, category, or user cohort.

This is a frequently asked question at Amazon (91 candidate reports). Interviewers ask it to test whether you can stream-process high-volume events, maintain time-windowed counts, handle hot keys from viral content, precompute results, and serve low-latency reads at scale. Strong answers show a clear separation of write aggregation and read serving, thoughtful handling of late and out-of-order data, and pragmatic tradeoffs between freshness, cost, and correctness.

Key Requirements

Functional

Time-windowed rankings -- support multiple fixed windows (1 hour, 24 hours, 7 days, 30 days, all-time) with each window updating independently
Segmented leaderboards -- partition rankings by dimensions like geography, category, or user attributes
Configurable result size -- clients can request top-K where K ranges from 10 to 1000, with stable ordering when scores tie
Paginated results -- enable page-through for large result sets with consistent snapshots across paginated requests

Non-Functional

Scalability -- handle 10M+ events per second during peak hours with independent scaling of write throughput and read capacity
Reliability -- tolerate datacenter failures without losing more than 5 minutes of leaderboard data
Latency -- p99 read latency under 100ms for top-K queries; write-to-visibility delay under 10 seconds
Consistency -- eventual consistency acceptable for leaderboards; rankings should converge within seconds

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Write Path and Event Aggregation

Interviewers want to see how you handle millions of incoming score events without creating database hotspots or losing data. They test whether you understand stream processing, event deduplication, and the difference between raw events and aggregated counts.

Hints to consider:

Use Kafka as a durable buffer between producers and consumers, partitioned by item_id for parallel processing while maintaining per-item ordering
Apply windowed aggregations in a stream processor (Flink) with watermarks to handle late-arriving events
Batch writes to the storage layer (flush every 5 seconds or 1000 events) to reduce write amplification
Use event-time processing rather than processing-time to ensure correct window assignment

2. Handling Hot Keys and Skew

Viral content creates enormous skew where a single item receives 1000x more events than average. Interviewers assess whether you recognize this problem and can propose solutions beyond naive sharding.

Hints to consider:

Implement split counters where hot keys are sharded into N sub-counters and merged periodically
Use probabilistic counting (HyperLogLog for unique counts, Count-Min Sketch for frequency) when exact precision is not required
Apply skew-aware routing where the ingestion layer detects hot items and distributes their events across more partitions
Separate the "top 10" (which changes rarely) from the "long tail" using tiered storage with different update frequencies

3. Maintaining and Serving Top-K Efficiently

Computing top-K by scanning all items on every query is a critical mistake. Interviewers look for pre-computation strategies and appropriate data structures.

Hints to consider:

Pre-compute top-K lists for each window/segment combination using stream processing with windowed aggregations that emit ranked results
Store pre-computed rankings in Redis sorted sets with scores, enabling O(log N) updates and O(log N + K) retrieval
Use min-heaps of size K during aggregation to maintain top candidates efficiently
Implement tiered caching: application cache (30s TTL), CDN (1 min TTL), Redis (authoritative, 10s freshness)

4. Time Window Management and Freshness

Different time windows have different update patterns and expiration logic. Interviewers probe whether you understand sliding vs tumbling windows and how to balance freshness with cost.

Hints to consider:

Use tumbling windows for fixed-duration leaderboards (daily resets at midnight) and sliding windows for "last 24 hours" that update continuously
Trade off freshness for cost: update "last hour" every 30 seconds but "last 30 days" only every 5 minutes
Implement window expiration through TTLs on Redis keys or partition-level deletion
Handle late-arriving events with allowed lateness in stream processing, then emit final window results

Practice/Amazon/Design a Top-K System

Design a Top-K System

System DesignMust

Problem Statement

Key Requirements

Functional

Time-windowed rankings -- support multiple fixed windows (1 hour, 24 hours, 7 days, 30 days, all-time) with each window updating independently
Segmented leaderboards -- partition rankings by dimensions like geography, category, or user attributes
Configurable result size -- clients can request top-K where K ranges from 10 to 1000, with stable ordering when scores tie
Paginated results -- enable page-through for large result sets with consistent snapshots across paginated requests

Non-Functional

Scalability -- handle 10M+ events per second during peak hours with independent scaling of write throughput and read capacity
Reliability -- tolerate datacenter failures without losing more than 5 minutes of leaderboard data
Latency -- p99 read latency under 100ms for top-K queries; write-to-visibility delay under 10 seconds
Consistency -- eventual consistency acceptable for leaderboards; rankings should converge within seconds

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Write Path and Event Aggregation

Hints to consider:

Use Kafka as a durable buffer between producers and consumers, partitioned by item_id for parallel processing while maintaining per-item ordering
Apply windowed aggregations in a stream processor (Flink) with watermarks to handle late-arriving events
Batch writes to the storage layer (flush every 5 seconds or 1000 events) to reduce write amplification
Use event-time processing rather than processing-time to ensure correct window assignment

2. Handling Hot Keys and Skew

Hints to consider:

Implement split counters where hot keys are sharded into N sub-counters and merged periodically
Use probabilistic counting (HyperLogLog for unique counts, Count-Min Sketch for frequency) when exact precision is not required
Apply skew-aware routing where the ingestion layer detects hot items and distributes their events across more partitions
Separate the "top 10" (which changes rarely) from the "long tail" using tiered storage with different update frequencies

3. Maintaining and Serving Top-K Efficiently

Computing top-K by scanning all items on every query is a critical mistake. Interviewers look for pre-computation strategies and appropriate data structures.

Hints to consider:

Pre-compute top-K lists for each window/segment combination using stream processing with windowed aggregations that emit ranked results
Store pre-computed rankings in Redis sorted sets with scores, enabling O(log N) updates and O(log N + K) retrieval
Use min-heaps of size K during aggregation to maintain top candidates efficiently
Implement tiered caching: application cache (30s TTL), CDN (1 min TTL), Redis (authoritative, 10s freshness)

4. Time Window Management and Freshness

Different time windows have different update patterns and expiration logic. Interviewers probe whether you understand sliding vs tumbling windows and how to balance freshness with cost.

Hints to consider:

Use tumbling windows for fixed-duration leaderboards (daily resets at midnight) and sliding windows for "last 24 hours" that update continuously
Trade off freshness for cost: update "last hour" every 30 seconds but "last 30 days" only every 5 minutes
Implement window expiration through TTLs on Redis keys or partition-level deletion
Handle late-arriving events with allowed lateness in stream processing, then emit final window results