Practice/Monzo/Design Top K Songs Widget for Spotify

Design Top K Songs Widget for Spotify

System DesignMust

Problem Statement

Design a personalized widget within the Spotify application that shows each user their top K most-played songs across configurable time ranges (last 4 weeks, last 6 months, all time). The widget should update in near real time as the user listens to more music, display play counts alongside track metadata, and let users interact with results (play a song, add it to a playlist, or share the list). It serves hundreds of millions of users, each with unique listening histories, and must handle millions of play events per second during peak hours.

The central challenge is building a per-user aggregation pipeline that ingests high-volume play events, maintains rolling time-windowed counters for every user-song pair, extracts top-K rankings efficiently, and serves those rankings with sub-200ms latency. You must reason about hot-key contention on popular tracks, idempotency for retried or late-arriving events, and the tradeoff between ranking freshness and infrastructure cost.

Key Requirements

Functional

Per-user top-K display -- each user sees a ranked list of their K most-played songs for a selected time range (last 4 weeks, last 6 months, all time)
Play counts and metadata -- each entry shows the song title, artist, album artwork, and the user's play count for that window
Inline actions -- users can play a song, like it, or add it to a playlist directly from the widget without navigating away
Shareable snapshots -- users generate a link or image summarizing their top-K list that can be shared on social media without exposing full listening history

Non-Functional

Scalability -- support 500 million monthly active users, each generating an average of 10 play events per day, with peak throughput of 5 million events per second
Reliability -- no play event is lost even during partial infrastructure failures; rankings eventually converge to accurate values
Latency -- widget loads in under 200ms; play events are reflected in rankings within 30 seconds under normal conditions
Consistency -- eventual consistency is acceptable; minor delays between a play event and its appearance in the ranking are tolerable

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Play Event Ingestion and Per-User Aggregation

Interviewers want to see how you handle high-throughput, per-user event streams without creating bottlenecks. They test whether you understand the difference between raw event storage and pre-aggregated counters, and whether your pipeline can absorb traffic spikes during popular album releases or holiday seasons.

Hints to consider:

Ingest play events into Kafka partitioned by user ID so that all events for a given user land on the same partition, preserving ordering and enabling stateful processing
Use a stream processor (Flink) with keyed state to maintain per-user, per-song, per-window counters that increment as events arrive
Batch counter flushes to the storage layer (every 10 seconds or every N events) to reduce write amplification while keeping freshness within the target
Attach idempotency keys to play events at the client so the pipeline can deduplicate retries from offline playback queues

2. Time Window Management for Rolling Rankings

Users select different time ranges, each requiring its own aggregation logic and expiration policy. Interviewers probe whether you understand sliding versus tumbling windows and how to age out old data efficiently.

Hints to consider:

For "last 4 weeks" and "last 6 months," use sliding windows where expired events are subtracted from counters as they age out, keeping rankings continuously fresh
For "all time," use a simple cumulative counter that only grows, avoiding the complexity of expiration
Store windowed counters with timestamps so that a background compaction process can recalculate counts if drift accumulates from approximations
Trade off freshness against cost: update short windows (4 weeks) every 15 seconds but long windows (6 months) every few minutes, since they change slowly

3. Efficient Top-K Extraction and Serving

Computing a user's top-K by scanning all their song counters on every request is too expensive at scale. Interviewers look for pre-computation and materialization strategies.

Hints to consider:

Maintain a min-heap of size K in the stream processor's state for each user and window, updating it as counters change and emitting the new top-K when it shifts
Write pre-computed top-K results to a fast key-value store (Redis or DynamoDB) keyed by (user_id, window), so the serving layer performs a single read
Cache widget responses at the application level with a short TTL (30 seconds) to absorb repeated loads without hitting the backing store
Handle the cold-start case (new users or rarely active users) by computing top-K on demand from the counter store and caching the result

4. Hot Key and Skew Handling

When a massively popular album drops, millions of users play the same songs simultaneously, creating write hotspots on per-song metadata and aggregation state. Interviewers assess whether you account for this skew.

Hints to consider:

Since aggregation is per-user, the hot-key problem is less severe on the counter side, but metadata enrichment (song title, artist, artwork) for popular tracks will see extreme read amplification
Cache song metadata aggressively in a CDN and application-level cache with longer TTLs, since metadata changes infrequently
If a global "most played" leaderboard supplements the per-user widget, use sharded counters for globally popular songs and merge them asynchronously
Monitor per-partition lag in the stream processor and auto-scale consumer parallelism when skew causes individual partitions to fall behind

5. Share and Snapshot Generation

The share feature requires generating a stable, visually appealing summary without exposing private listening data beyond the list itself. Interviewers check that you consider privacy, caching, and link stability.

Hints to consider:

Generate a unique, short-lived token for each share request that maps to a frozen snapshot of the user's top-K at that moment, stored in a key-value store with a TTL
Render the visual snapshot (image or card) server-side using a headless renderer or template service, cache it in object storage (S3), and serve it via CDN
Ensure the shared link reveals only the ranked list (song names, play counts, album art) and not the user's full listening history or account details
Allow users to regenerate or revoke shared links, cleaning up the corresponding cached assets

Practice/Monzo/Design Top K Songs Widget for Spotify

Design Top K Songs Widget for Spotify

System DesignMust

Problem Statement

Key Requirements

Functional

Per-user top-K display -- each user sees a ranked list of their K most-played songs for a selected time range (last 4 weeks, last 6 months, all time)
Play counts and metadata -- each entry shows the song title, artist, album artwork, and the user's play count for that window
Inline actions -- users can play a song, like it, or add it to a playlist directly from the widget without navigating away
Shareable snapshots -- users generate a link or image summarizing their top-K list that can be shared on social media without exposing full listening history

Non-Functional

Scalability -- support 500 million monthly active users, each generating an average of 10 play events per day, with peak throughput of 5 million events per second
Reliability -- no play event is lost even during partial infrastructure failures; rankings eventually converge to accurate values
Latency -- widget loads in under 200ms; play events are reflected in rankings within 30 seconds under normal conditions
Consistency -- eventual consistency is acceptable; minor delays between a play event and its appearance in the ranking are tolerable

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Play Event Ingestion and Per-User Aggregation

Hints to consider:

Ingest play events into Kafka partitioned by user ID so that all events for a given user land on the same partition, preserving ordering and enabling stateful processing
Use a stream processor (Flink) with keyed state to maintain per-user, per-song, per-window counters that increment as events arrive
Batch counter flushes to the storage layer (every 10 seconds or every N events) to reduce write amplification while keeping freshness within the target
Attach idempotency keys to play events at the client so the pipeline can deduplicate retries from offline playback queues

2. Time Window Management for Rolling Rankings

Hints to consider:

For "last 4 weeks" and "last 6 months," use sliding windows where expired events are subtracted from counters as they age out, keeping rankings continuously fresh
For "all time," use a simple cumulative counter that only grows, avoiding the complexity of expiration
Store windowed counters with timestamps so that a background compaction process can recalculate counts if drift accumulates from approximations
Trade off freshness against cost: update short windows (4 weeks) every 15 seconds but long windows (6 months) every few minutes, since they change slowly

3. Efficient Top-K Extraction and Serving

Computing a user's top-K by scanning all their song counters on every request is too expensive at scale. Interviewers look for pre-computation and materialization strategies.

Hints to consider:

Maintain a min-heap of size K in the stream processor's state for each user and window, updating it as counters change and emitting the new top-K when it shifts
Write pre-computed top-K results to a fast key-value store (Redis or DynamoDB) keyed by (user_id, window), so the serving layer performs a single read
Cache widget responses at the application level with a short TTL (30 seconds) to absorb repeated loads without hitting the backing store
Handle the cold-start case (new users or rarely active users) by computing top-K on demand from the counter store and caching the result

4. Hot Key and Skew Handling

Hints to consider:

Since aggregation is per-user, the hot-key problem is less severe on the counter side, but metadata enrichment (song title, artist, artwork) for popular tracks will see extreme read amplification
Cache song metadata aggressively in a CDN and application-level cache with longer TTLs, since metadata changes infrequently
If a global "most played" leaderboard supplements the per-user widget, use sharded counters for globally popular songs and merge them asynchronously
Monitor per-partition lag in the stream processor and auto-scale consumer parallelism when skew causes individual partitions to fall behind

5. Share and Snapshot Generation

Hints to consider:

Generate a unique, short-lived token for each share request that maps to a frozen snapshot of the user's top-K at that moment, stored in a key-value store with a TTL
Render the visual snapshot (image or card) server-side using a headless renderer or template service, cache it in object storage (S3), and serve it via CDN
Ensure the shared link reveals only the ranked list (song names, play counts, album art) and not the user's full listening history or account details
Allow users to regenerate or revoke shared links, cleaning up the corresponding cached assets