Design a social music feature that allows users to see what songs their friends are currently listening to or have recently played. Think of the sidebar in Spotify's desktop app that shows a live feed of your friends' musical activity -- each entry shows a friend's name, the track they are playing, and a timestamp.
The core engineering challenge is delivering near-real-time presence updates to potentially hundreds of friends per user without creating excessive write amplification or hot partitions. Every time someone presses play, that event must propagate to all of their friends within a few seconds. With hundreds of millions of users and frequent track changes (every 3-4 minutes on average), the system generates an enormous volume of ephemeral events that expire quickly but must be served with low latency.
You also need to respect privacy controls: users can enable a "private session" that hides their activity, block specific friends, or opt out entirely. These controls must be enforced consistently on both the write and read paths to prevent data leaks even under caching and eventual consistency.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The central design decision is how updates flow from a user who changes tracks to all of their friends' clients. Interviewers want you to reason about the tradeoffs between WebSocket push, server-sent events, and client polling.
Hints to consider:
When a user with thousands of followers changes tracks, naively writing to each follower's feed causes write amplification. Interviewers probe whether you choose fanout-on-write, fanout-on-read, or a hybrid approach.
Hints to consider:
MGET to fetch the current track for all friends in a single round trip, keeping read latency under 10ms for the cache hit casePrivacy settings must be respected in real time, which complicates caching and fanout. Interviewers look for correct handling of private sessions, blocked users, and opt-outs.
Hints to consider:
Choosing the right storage for ephemeral presence data versus recent activity history affects both latency and operational complexity.
Hints to consider:
Start by confirming scope. Ask about the expected number of friends per user (typical: 50-300), whether the feature is limited to mutual friends or includes followers, and the retention window for recent activity. Clarify whether the feed should update while the app is in the background or only when actively viewing. Confirm privacy requirements: what happens to in-flight updates when a user toggles private session, and how quickly must that take effect.
Sketch five core components: (1) Playback Event Ingester that receives track-change events from the music player service and publishes them to Kafka, (2) Activity Writer that consumes events and updates the user's now-playing record and recent history in Redis, (3) Friend Activity API that handles initial load requests by reading friend lists and batch-fetching activity from Redis, (4) WebSocket Gateway that maintains persistent connections to online clients and pushes live updates, and (5) Privacy Service that manages and caches visibility settings.
Walk through the flow: User A starts a new song. The playback service publishes an event to Kafka. The Activity Writer updates A's now-playing key in Redis. Meanwhile, a Kafka consumer notifies the WebSocket Gateway, which looks up which of A's friends are currently connected, checks privacy settings, and pushes the update to their WebSocket connections.
When a user opens the sidebar, the client calls the Friend Activity API. The API fetches the user's friend list (up to a few hundred IDs), then issues a Redis MGET to retrieve the now-playing record for all friends in a single round trip. For recent history, it fetches the last 5-10 entries per friend from sorted sets, again batched. The API filters out friends with private sessions or blocked relationships, ranks results by recency, and returns the response. With Redis cache hits, this completes in under 50ms server-side. On cache miss, the system falls back to Cassandra for durable history, adding latency but preserving correctness.
Cover graceful degradation: if Redis is unavailable, serve stale data from a local in-memory cache on the API servers with a 30-second TTL, and disable live push updates until Redis recovers. Discuss monitoring: track p99 latency for activity loading, WebSocket connection counts, Kafka consumer lag, and privacy enforcement latency. Address abuse prevention: rate-limit how frequently a single user's now-playing record can update (e.g., max once per 5 seconds) to handle bots or rapid skipping. Finally, discuss cost: Redis memory is expensive at scale, so tune TTLs aggressively and consider tiered storage where only active users' data lives in Redis while dormant users' history is served from disk-backed storage.
Deepen your understanding of the patterns used in this problem: