You are designing the real-time commenting system that powers live video broadcasts on platforms like Facebook Live and Instagram Live. When a creator starts streaming, viewers across the globe can post text comments that appear on every viewer's screen in near real-time. The system must handle wildly varying traffic patterns, from a casual stream with a handful of viewers to a celebrity broadcast attracting tens of millions of simultaneous participants.
A critical challenge is the mobile-first nature of live video audiences. Viewers frequently lose connectivity, switch between Wi-Fi and cellular, or background the app entirely. When they return, they expect to catch up on comments they missed without seeing duplicates or gaps. The system must also support moderation capabilities, allowing creators and platform moderators to filter inappropriate content and ban disruptive users.
At peak load, the platform serves over 10 million concurrent live streams worldwide, with the most popular individual streams reaching 5 million or more simultaneous viewers. Each comment must propagate from submission to display on remote screens within a few hundred milliseconds.
Based on real interview experiences, these are the areas interviewers probe most deeply:
A single stream with millions of viewers creates an enormous fanout problem. Pushing each comment individually to every connected client is not feasible at this scale.
Hints to consider:
Maintaining millions of concurrent WebSocket connections is an infrastructure challenge distinct from stateless HTTP serving.
Hints to consider:
Viewers connected to different servers may receive the same comments in slightly different orders, which can produce a jarring user experience.
Hints to consider:
Mobile networks are inherently unreliable, and viewers routinely disconnect and rejoin streams.
Hints to consider:
Confirm the expected scale: number of concurrent streams, peak viewers per stream, and average comment rate per second. Ask whether intelligent sampling of comments is acceptable for ultra-popular streams where viewers cannot read every comment anyway. Clarify how long comments need to be retained -- only for the lifetime of the stream plus a short buffer, or indefinitely for replay. Confirm whether rich media (emoji reactions, stickers) is in scope or if the system handles text only.
Viewers connect via WebSockets to a fleet of connection servers behind a load balancer. When a viewer submits a comment, the request hits an API gateway that validates the user and passes the comment through a moderation filter. Approved comments are written to a Kafka topic partitioned by stream ID. A fanout service consumes from Kafka and publishes each comment to a Redis Pub/Sub channel for the corresponding stream. Each connection server subscribes to channels for the streams its connected viewers are watching, then pushes incoming comments to those clients over their WebSocket connections. A Redis sorted set per stream holds the last several minutes of comments, enabling reconnection catch-up.
For hot streams, a single Kafka partition and Redis Pub/Sub channel can become bottlenecks. Introduce tiered fanout: the fanout service publishes to a per-stream channel, and each connection server holding viewers for that stream subscribes once. This means the fanout cost scales with the number of connection servers, not the number of viewers. For streams exceeding a configurable threshold (e.g., 100K concurrent viewers), activate comment sampling where only a representative subset of comments is delivered to each viewer, rotating which comments are shown to maintain diversity. Connection servers batch outgoing WebSocket messages into small windows (e.g., 50ms) to reduce per-message overhead and smooth delivery.
For moderation, run an inline filter that checks each comment against a banned-word trie and a per-stream ban list stored in Redis, rejecting matches before they enter the fanout path. For reconnection, clients include their last-seen sequence number when re-establishing a WebSocket; the connection server queries the Redis sorted set to fetch missed comments. For long-term durability, persist comments asynchronously to Cassandra for post-stream replay and analytics. Monitor per-stream fanout latency and automatically enable sampling when delivery times exceed thresholds. Use consistent hashing to co-locate viewers of the same stream on the same connection servers, reducing the number of servers that must subscribe to each channel.
Deepen your understanding of the patterns used in this problem: