Design a real-time commenting system for live video streams on a platform like Facebook or Instagram, where millions of viewers can post comments and reactions that appear on every viewer's screen within milliseconds. The system must handle Facebook-scale traffic (roughly 2 billion daily active users) and support features like reactions, historical comment catch-up for late joiners, and seamless transitions when users switch between live streams.
The core engineering challenges are low-latency fanout to millions of concurrent viewers on a single popular stream, managing hot-key contention where a single live video ID becomes the focal point for all writes and reads, handling unreliable mobile clients that disconnect, background their apps, and rejoin frequently, and maintaining a coherent comment timeline without gaps or duplicates. Strong answers connect product needs (smooth scrolling, no missed messages, graceful degradation) to architectural patterns (pub/sub, cursors, replay buffers, backpressure) and client-side strategies.
Based on real interview experiences, these are the areas interviewers probe most deeply:
A single popular live stream with 50 million viewers means every comment must be distributed to millions of connections. Naive broadcasting through a single server is impossible. Interviewers test whether you can design a scalable fanout tree.
Hints to consider:
Mobile viewers constantly lose connections, background their apps, and rejoin. Without a proper catch-up mechanism, they will see gaps in the comment stream or duplicates that break the scrolling experience.
Hints to consider:
A single live video ID becomes a write hotspot that can overwhelm storage, caching, and fanout infrastructure. Interviewers want to see how you isolate hot streams and apply backpressure to protect the broader system.
Hints to consider:
The system has conflicting storage needs: real-time append-only writes for incoming comments, sequential reads for viewer catch-up, and historical access for post-stream replay. No single storage system excels at all three.
Hints to consider:
Sports and entertainment audiences are globally distributed, but latency targets are strict. Simply replicating to all regions creates coordination overhead.
Hints to consider:
Confirm the scale parameters. Ask about the distribution of stream popularity (a few mega-events vs. many small streams) to understand provisioning needs. Clarify whether comments must be archived permanently or just retained for the stream duration plus a window. Determine whether reactions are simple emoji counts or more complex (threaded replies). Confirm latency targets for different regions and whether some regions can have relaxed SLAs.
Sketch the data flow: viewers submit comments through WebSocket connections to gateway servers. Gateways forward comments to a Comment Ingestion Service that validates, rate-limits, assigns sequence numbers, and publishes to a Kafka topic partitioned by stream ID. A fleet of Fanout Workers consumes from Kafka and pushes comments to all gateway servers hosting viewers for that stream, using Redis Pub/Sub or direct server-to-server channels. Gateways then push to individual connected clients. Show a Redis cluster holding recent comment buffers per stream for fast reconnect. Include an async consumer that persists comments to Cassandra for historical access. Add a Reaction Service that processes emoji reactions with sharded counters.
Walk through the lifecycle of a single comment. A viewer submits a comment via WebSocket. The gateway validates the user session and forwards to the Ingestion Service. The service checks rate limits (per user and per stream), assigns the next sequence number for this stream, and publishes to Kafka. A Fanout Worker consumes the event, looks up which gateway servers have active viewers for this stream (maintained in a registry updated by gateway heartbeats), and pushes the comment to each gateway. Each gateway performs local fanout to its connected clients for this stream. For reconnection: a client connects with last-seen sequence N. The gateway checks the Redis replay buffer for this stream, retrieves all comments with sequence greater than N, and sends them as a catch-up batch before switching to live push. The client deduplicates using comment IDs.
Cover hot-stream detection: a monitoring service tracks connection counts and comment rates per stream and dynamically allocates dedicated Kafka partitions and consumer instances. Discuss reaction handling: lightweight emoji events processed by a separate Reaction Service with sharded Redis counters, periodically flushed to the database. Address content moderation: a filter in the Ingestion Service checks comments against blocked-word lists and ML classifiers before publishing. Touch on monitoring: track comment delivery latency (ingestion to client display), fanout lag per stream, gateway connection counts, Kafka consumer lag, and Redis buffer sizes. Discuss cost optimization: tiered storage (Redis for hot recent data, Kafka for medium-term retention, Cassandra for cold historical data), and auto-scaling gateway clusters based on active connections.
Deepen your understanding of the patterns used in this problem: