Practice/Meta/Design Status Search

Design Status Search

System DesignMust

Problem Statement

Design a system that aggregates and displays live updates from millions of concurrent events happening around the world. Think of a platform where sports matches, concerts, conferences, and breaking news events all stream real-time commentary, scores, and updates. Users should be able to browse active events, subscribe to specific ones, and see updates flow in with sub-second latency. The system must handle peak loads during major global events (like World Cup finals or breaking news) where tens of millions of users simultaneously follow the same event, while also serving thousands of smaller niche events with just a handful of followers each.

Your design needs to support event creators posting updates at varying rates (from one update per minute to hundreds per second during intense moments), handle massive fan-out when popular events broadcast to millions, maintain ordering guarantees so users see updates in the correct sequence, and scale both horizontally for more events and vertically for viral moments. The challenge is balancing write throughput, read fan-out efficiency, storage costs for billions of historical updates, and the real-time delivery expectations of modern users.

Key Requirements

Functional

Event creation and management -- users can create events, post updates to them, and close/archive events when finished
Real-time update delivery -- followers receive new updates within 1 second of posting with correct chronological ordering
Event discovery and subscription -- users can browse trending/active events and subscribe to receive live updates
Historical playback -- users can view past updates from completed events in chronological order with pagination

Non-Functional

Scalability -- support 100K active events simultaneously with 1M updates per second globally during peak hours
Reliability -- no update loss; system should gracefully degrade under extreme load rather than fail completely
Latency -- median update delivery under 500ms; p99 under 2 seconds even for events with millions of followers
Consistency -- updates must appear in the same order for all subscribers; no duplicate delivery

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Write Path and Fan-Out Architecture

Interviewers want to see how you handle the fan-out problem when a single update must reach millions of subscribers. The naive approach of writing to millions of user timelines synchronously will collapse under load.

Hints to consider:

Discuss the tradeoff between fan-out on write (pre-compute delivery lists) versus fan-out on read (compute on demand)
Consider hybrid approaches where celebrity events use different delivery mechanisms than small events
Think about batching, buffering, and backpressure mechanisms when delivery queues get overwhelmed
Address how you detect and handle "hot" events that suddenly go viral

2. Real-Time Delivery Mechanism

The interviewer will push on how you actually push updates to connected clients with sub-second latency while maintaining ordering guarantees and handling network failures.

Hints to consider:

Compare WebSocket connections versus Server-Sent Events versus long polling for different client types
Discuss how you maintain persistent connections at scale and which layer handles connection state
Think about how to partition connections across servers and route updates to the correct connection
Consider sequence numbers, acknowledgments, and client-side buffering for reliability

3. Storage Strategy for Hot and Cold Data

With billions of updates accumulating daily, interviewers expect you to articulate a tiered storage approach that balances access patterns, retention requirements, and cost.

Hints to consider:

Recent/active event data needs low-latency access while historical data can tolerate higher latency
Discuss time-based partitioning and automated archival policies
Consider different storage technologies for the write-heavy hot path versus bulk cold storage
Think about how query patterns differ between live events (sequential reads) and historical browsing (random access)

4. Handling Skewed Load Distribution

Real-world event popularity follows a power law distribution. Interviewers want to see if you recognize that 1% of events will consume 99% of resources and design accordingly.

Hints to consider:

Discuss detecting trending events early through metrics and adaptive rate limiting
Consider read replicas, caching layers, and CDN strategies for viral content
Think about dedicated infrastructure pools or priority queues for high-profile events
Address fairness and quality-of-service guarantees for smaller events during global traffic spikes

Practice/Meta/Design Status Search

Design Status Search

System DesignMust

Problem Statement

Key Requirements

Functional

Event creation and management -- users can create events, post updates to them, and close/archive events when finished
Real-time update delivery -- followers receive new updates within 1 second of posting with correct chronological ordering
Event discovery and subscription -- users can browse trending/active events and subscribe to receive live updates
Historical playback -- users can view past updates from completed events in chronological order with pagination

Non-Functional

Scalability -- support 100K active events simultaneously with 1M updates per second globally during peak hours
Reliability -- no update loss; system should gracefully degrade under extreme load rather than fail completely
Latency -- median update delivery under 500ms; p99 under 2 seconds even for events with millions of followers
Consistency -- updates must appear in the same order for all subscribers; no duplicate delivery

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Write Path and Fan-Out Architecture

Hints to consider:

Discuss the tradeoff between fan-out on write (pre-compute delivery lists) versus fan-out on read (compute on demand)
Consider hybrid approaches where celebrity events use different delivery mechanisms than small events
Think about batching, buffering, and backpressure mechanisms when delivery queues get overwhelmed
Address how you detect and handle "hot" events that suddenly go viral

2. Real-Time Delivery Mechanism

The interviewer will push on how you actually push updates to connected clients with sub-second latency while maintaining ordering guarantees and handling network failures.

Hints to consider:

Compare WebSocket connections versus Server-Sent Events versus long polling for different client types
Discuss how you maintain persistent connections at scale and which layer handles connection state
Think about how to partition connections across servers and route updates to the correct connection
Consider sequence numbers, acknowledgments, and client-side buffering for reliability

3. Storage Strategy for Hot and Cold Data

With billions of updates accumulating daily, interviewers expect you to articulate a tiered storage approach that balances access patterns, retention requirements, and cost.

Hints to consider:

Recent/active event data needs low-latency access while historical data can tolerate higher latency
Discuss time-based partitioning and automated archival policies
Consider different storage technologies for the write-heavy hot path versus bulk cold storage
Think about how query patterns differ between live events (sequential reads) and historical browsing (random access)

4. Handling Skewed Load Distribution

Real-world event popularity follows a power law distribution. Interviewers want to see if you recognize that 1% of events will consume 99% of resources and design accordingly.

Hints to consider:

Discuss detecting trending events early through metrics and adaptive rate limiting
Consider read replicas, caching layers, and CDN strategies for viral content
Think about dedicated infrastructure pools or priority queues for high-profile events
Address fairness and quality-of-service guarantees for smaller events during global traffic spikes