Design Meta Live Comments — Figma

Problem Statement

Design a real-time commenting system for live video streams on a platform like Facebook or Instagram, where millions of viewers can post comments and reactions that appear on every viewer's screen within milliseconds. The system must handle Facebook-scale traffic (roughly 2 billion daily active users) and support features like reactions, historical comment catch-up for late joiners, and seamless transitions when users switch between live streams.

The core engineering challenges are low-latency fanout to millions of concurrent viewers on a single popular stream, managing hot-key contention where a single live video ID becomes the focal point for all writes and reads, handling unreliable mobile clients that disconnect, background their apps, and rejoin frequently, and maintaining a coherent comment timeline without gaps or duplicates. Strong answers connect product needs (smooth scrolling, no missed messages, graceful degradation) to architectural patterns (pub/sub, cursors, replay buffers, backpressure) and client-side strategies.

Key Requirements

Functional

Post comments and reactions -- viewers can submit text comments and emoji reactions to an active live video stream
Real-time display -- new comments appear on all viewers' screens in near real time without manual refresh
Historical catch-up -- viewers joining mid-stream see recent comments from before they arrived to understand the conversation context
Stream switching -- users can move between live videos and immediately see only relevant comments for the current stream without gaps or duplicates

Non-Functional

Scalability -- support 50 million+ concurrent viewers on a single major event, with 100+ simultaneous live streams platform-wide
Reliability -- 99.95% uptime during live events with graceful degradation (e.g., sampling comments) rather than complete failure under extreme load
Latency -- p95 end-to-end latency under 500ms from comment submission to display on viewer screens; p99 under 1 second
Consistency -- strict chronological ordering of comments within a single stream; eventual consistency for reaction counts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Fanout Architecture for Hot Streams

A single popular live stream with 50 million viewers means every comment must be distributed to millions of connections. Naive broadcasting through a single server is impossible. Interviewers test whether you can design a scalable fanout tree.

Hints to consider:

Partition viewers across multiple WebSocket gateway servers using consistent hashing on the live video ID to co-locate viewers of the same stream
Build a hierarchical fanout tree: comments flow from the ingestion service to a Kafka topic partitioned by stream ID, then to regional relay servers, then to edge gateway servers, and finally to individual clients
Co-locating viewers of the same stream on the same gateway reduces cross-server messaging overhead and simplifies local fanout
For extremely hot streams, apply controlled comment sampling (show a representative subset rather than every comment) to prevent network saturation on client devices

2. Reconnection and Catch-Up Protocol

Mobile viewers constantly lose connections, background their apps, and rejoin. Without a proper catch-up mechanism, they will see gaps in the comment stream or duplicates that break the scrolling experience.

Hints to consider:

Assign monotonically increasing sequence numbers to comments within each stream; clients track the last sequence number they received
On reconnection, the client sends its last-seen sequence number; the server replays all comments since that point from a bounded replay buffer
Maintain a recent comment buffer in Redis (last 5-10 minutes per stream) so catch-up queries do not hit the persistent database
Use client-side deduplication based on comment IDs to gracefully handle overlapping replay and live delivery during the transition

3. Hot-Key Management and Backpressure

A single live video ID becomes a write hotspot that can overwhelm storage, caching, and fanout infrastructure. Interviewers want to see how you isolate hot streams and apply backpressure to protect the broader system.

Hints to consider:

Detect hot streams dynamically by monitoring connection counts and comment rates per stream ID; promote them to dedicated infrastructure pools
Apply ingestion-side rate limiting per user (prevent spam) and per stream (cap total comment throughput) with clear user feedback
Use Kafka partitioning by stream ID so hot streams occupy dedicated partitions that can be assigned to dedicated consumer instances
Implement tiered service levels: degrade non-critical features (reaction count accuracy, comment history depth) before impacting core comment delivery

4. Storage for Dual Access Patterns

The system has conflicting storage needs: real-time append-only writes for incoming comments, sequential reads for viewer catch-up, and historical access for post-stream replay. No single storage system excels at all three.

Hints to consider:

Use Kafka as the durable source of truth for comment events partitioned by stream ID, supporting both real-time tailing and replay from arbitrary offsets
Maintain a sliding-window cache of recent comments in Redis with TTL-based eviction for fast reconnect scenarios
Asynchronously write comments to a persistent store (Cassandra or DynamoDB) for post-stream historical access and analytics
Partition persistent storage by stream ID so all comments for a stream are co-located for efficient range queries

5. Geographic Distribution

Sports and entertainment audiences are globally distributed, but latency targets are strict. Simply replicating to all regions creates coordination overhead.

Hints to consider:

Deploy WebSocket gateway clusters in each major region but maintain a single write path (one Kafka cluster or a primary region) to avoid split-brain issues
Use a global message bus or cross-region Kafka replication to propagate comments to regional fanout tiers with minimal added latency
Accept that some viewers will see comments 100-200ms later than viewers in the primary region; optimize for within-region latency
Design for regional failures by falling back to cross-region WebSocket connections rather than complete stream outages

Suggested Approach

Step 1: Clarify Requirements

Confirm the scale parameters. Ask about the distribution of stream popularity (a few mega-events vs. many small streams) to understand provisioning needs. Clarify whether comments must be archived permanently or just retained for the stream duration plus a window. Determine whether reactions are simple emoji counts or more complex (threaded replies). Confirm latency targets for different regions and whether some regions can have relaxed SLAs.

Step 2: High-Level Architecture

Sketch the data flow: viewers submit comments through WebSocket connections to gateway servers. Gateways forward comments to a Comment Ingestion Service that validates, rate-limits, assigns sequence numbers, and publishes to a Kafka topic partitioned by stream ID. A fleet of Fanout Workers consumes from Kafka and pushes comments to all gateway servers hosting viewers for that stream, using Redis Pub/Sub or direct server-to-server channels. Gateways then push to individual connected clients. Show a Redis cluster holding recent comment buffers per stream for fast reconnect. Include an async consumer that persists comments to Cassandra for historical access. Add a Reaction Service that processes emoji reactions with sharded counters.

Step 3: Deep Dive on Fanout and Reconnection

Walk through the lifecycle of a single comment. A viewer submits a comment via WebSocket. The gateway validates the user session and forwards to the Ingestion Service. The service checks rate limits (per user and per stream), assigns the next sequence number for this stream, and publishes to Kafka. A Fanout Worker consumes the event, looks up which gateway servers have active viewers for this stream (maintained in a registry updated by gateway heartbeats), and pushes the comment to each gateway. Each gateway performs local fanout to its connected clients for this stream. For reconnection: a client connects with last-seen sequence N. The gateway checks the Redis replay buffer for this stream, retrieves all comments with sequence greater than N, and sends them as a catch-up batch before switching to live push. The client deduplicates using comment IDs.

Step 4: Address Secondary Concerns

Cover hot-stream detection: a monitoring service tracks connection counts and comment rates per stream and dynamically allocates dedicated Kafka partitions and consumer instances. Discuss reaction handling: lightweight emoji events processed by a separate Reaction Service with sharded Redis counters, periodically flushed to the database. Address content moderation: a filter in the Ingestion Service checks comments against blocked-word lists and ML classifiers before publishing. Touch on monitoring: track comment delivery latency (ingestion to client display), fanout lag per stream, gateway connection counts, Kafka consumer lag, and Redis buffer sizes. Discuss cost optimization: tiered storage (Redis for hot recent data, Kafka for medium-term retention, Cassandra for cold historical data), and auto-scaling gateway clusters based on active connections.

Related Learning

Deepen your understanding of the patterns used in this problem:

Slack -- real-time message delivery with WebSocket fanout, channel-based routing, and reconnection protocols
Distributed Counters -- sharded counter techniques for reaction counts on hot content without contention
Caching -- Redis replay buffers for fast reconnect and recent comment serving
Message Queues -- Kafka for durable ordered comment logs with partitioned fanout
Databases -- wide-column storage for historical comment archives partitioned by stream
Load Balancers -- distributing WebSocket connections with stream-aware routing

Problem Statement

Key Requirements

Functional

Post comments and reactions -- viewers can submit text comments and emoji reactions to an active live video stream
Real-time display -- new comments appear on all viewers' screens in near real time without manual refresh
Historical catch-up -- viewers joining mid-stream see recent comments from before they arrived to understand the conversation context
Stream switching -- users can move between live videos and immediately see only relevant comments for the current stream without gaps or duplicates

Non-Functional

Scalability -- support 50 million+ concurrent viewers on a single major event, with 100+ simultaneous live streams platform-wide
Reliability -- 99.95% uptime during live events with graceful degradation (e.g., sampling comments) rather than complete failure under extreme load
Latency -- p95 end-to-end latency under 500ms from comment submission to display on viewer screens; p99 under 1 second
Consistency -- strict chronological ordering of comments within a single stream; eventual consistency for reaction counts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Fanout Architecture for Hot Streams

Hints to consider:

Partition viewers across multiple WebSocket gateway servers using consistent hashing on the live video ID to co-locate viewers of the same stream
Build a hierarchical fanout tree: comments flow from the ingestion service to a Kafka topic partitioned by stream ID, then to regional relay servers, then to edge gateway servers, and finally to individual clients
Co-locating viewers of the same stream on the same gateway reduces cross-server messaging overhead and simplifies local fanout
For extremely hot streams, apply controlled comment sampling (show a representative subset rather than every comment) to prevent network saturation on client devices

2. Reconnection and Catch-Up Protocol

Hints to consider:

Assign monotonically increasing sequence numbers to comments within each stream; clients track the last sequence number they received
On reconnection, the client sends its last-seen sequence number; the server replays all comments since that point from a bounded replay buffer
Maintain a recent comment buffer in Redis (last 5-10 minutes per stream) so catch-up queries do not hit the persistent database
Use client-side deduplication based on comment IDs to gracefully handle overlapping replay and live delivery during the transition

3. Hot-Key Management and Backpressure

Hints to consider:

Detect hot streams dynamically by monitoring connection counts and comment rates per stream ID; promote them to dedicated infrastructure pools
Apply ingestion-side rate limiting per user (prevent spam) and per stream (cap total comment throughput) with clear user feedback
Use Kafka partitioning by stream ID so hot streams occupy dedicated partitions that can be assigned to dedicated consumer instances
Implement tiered service levels: degrade non-critical features (reaction count accuracy, comment history depth) before impacting core comment delivery

4. Storage for Dual Access Patterns

Hints to consider:

Use Kafka as the durable source of truth for comment events partitioned by stream ID, supporting both real-time tailing and replay from arbitrary offsets
Maintain a sliding-window cache of recent comments in Redis with TTL-based eviction for fast reconnect scenarios
Asynchronously write comments to a persistent store (Cassandra or DynamoDB) for post-stream historical access and analytics
Partition persistent storage by stream ID so all comments for a stream are co-located for efficient range queries

5. Geographic Distribution

Sports and entertainment audiences are globally distributed, but latency targets are strict. Simply replicating to all regions creates coordination overhead.

Hints to consider:

Deploy WebSocket gateway clusters in each major region but maintain a single write path (one Kafka cluster or a primary region) to avoid split-brain issues
Use a global message bus or cross-region Kafka replication to propagate comments to regional fanout tiers with minimal added latency
Accept that some viewers will see comments 100-200ms later than viewers in the primary region; optimize for within-region latency
Design for regional failures by falling back to cross-region WebSocket connections rather than complete stream outages

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Fanout and Reconnection

Step 4: Address Secondary Concerns

Related Learning

Deepen your understanding of the patterns used in this problem:

Slack -- real-time message delivery with WebSocket fanout, channel-based routing, and reconnection protocols
Distributed Counters -- sharded counter techniques for reaction counts on hot content without contention
Caching -- Redis replay buffers for fast reconnect and recent comment serving
Message Queues -- Kafka for durable ordered comment logs with partitioned fanout
Databases -- wide-column storage for historical comment archives partitioned by stream
Load Balancers -- distributing WebSocket connections with stream-aware routing