Practice/Meta/Design Netflix/Video Streaming Platform
Design Netflix/Video Streaming Platform
System DesignMust
Problem Statement
Design a platform that streams live sports events to millions of concurrent viewers worldwide. The system must handle major sporting events like championship games or international tournaments where viewership can spike from baseline traffic to tens of millions of concurrent streams within minutes. Users expect near real-time viewing experiences with minimal delay behind the live event, high video quality that adapts to their network conditions, and the ability to rewind or pause the live stream and catch up at their own pace. The platform must support diverse devices including mobile apps, web browsers, smart TVs, and set-top boxes, while providing synchronized overlays like live scores, commentary, and interactive polls. Unlike on-demand streaming, live sports cannot be fully pre-processed, requiring real-time encoding, packaging, and distribution under strict latency constraints.
The core challenge lies in balancing three competing forces: achieving broadcast-quality with minimal buffering, keeping the stream as close to real-time as possible (ideally under 10 seconds of delay), and scaling to handle unpredictable traffic surges during pivotal game moments. You'll need to design for geographic distribution, handle encoder failures during live events, manage per-user state for pause and rewind functionality, and ensure consistent viewing experiences across vastly different network conditions and device capabilities.
Key Requirements
Functional
- Live video ingestion and encoding -- Accept live video feeds from broadcast partners or stadium cameras, encode them into multiple quality levels in real-time, and package them for adaptive bitrate streaming
- Low-latency playback -- Deliver live streams to viewers with end-to-end latency under 10 seconds while supporting adaptive bitrate switching based on network conditions
- Time-shift capabilities -- Allow users to pause, rewind up to 2 hours, and resume live playback, with seamless transitions between recorded and live content
- Real-time metadata overlay -- Synchronize live scores, player statistics, and interactive elements with the video stream across all viewers
Non-Functional
- Scalability -- Support 50 million concurrent viewers during peak events with the ability to scale up within minutes as viewership ramps
- Reliability -- Maintain 99.99% uptime during live events with automatic failover for encoder and origin failures, ensuring no viewer sees a black screen
- Latency -- Achieve glass-to-glass latency of 6-10 seconds for live streams and sub-200ms response times for control operations like pause and quality switching
- Consistency -- Ensure all viewers see synchronized live content within a 2-second window, with eventually consistent metadata overlays acceptable within 5 seconds
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Real-Time Video Pipeline and Latency Management
Live streaming presents fundamentally different challenges than on-demand video. Interviewers want to see how you handle the tension between low latency and reliability when you cannot pre-process content, and how you architect the pipeline from camera to viewer.
Hints to consider:
- Discuss the tradeoffs between traditional HLS (6-30 second latency), Low-Latency HLS, and newer protocols like WebRTC or SRT for sub-second delivery
- Consider how segment duration affects both latency and resilience to network issues -- smaller chunks reduce latency but increase request overhead and make caching less effective
- Explain how to handle live encoder failures without viewers noticing, including hot standby encoders and redundant ingest paths
- Address the challenge of Adaptive Bitrate (ABR) switching in live contexts where you have limited future segments to buffer
2. Massive Scale and Traffic Spike Handling
Major sporting events create unprecedented traffic patterns with millions of users joining simultaneously when games start or dramatic moments occur. Interviewers probe whether you understand CDN architecture, origin shielding, and how to prevent cascading failures.
Hints to consider:
- Design a multi-tier caching strategy with edge CDNs, regional mid-tier caches, and origin servers to minimize origin load and reduce latency
- Explain origin shielding to prevent cache stampedes where thousands of CDN nodes simultaneously request the same new live segment
- Discuss pre-warming CDN caches before major events and managing cache eviction policies for live versus time-shifted content
- Consider how to handle geographic distribution and peering agreements to serve viewers close to their location without round-tripping to centralized origins
3. Time-Shift State Management and DVR Functionality
Allowing users to pause, rewind, and resume live streams creates complex state management challenges. Each viewer has a unique playback position, and the system must efficiently store recent segments while managing storage costs and enabling fast random access.
Hints to consider:
- Design a sliding window buffer system that retains the last 2 hours of segments in hot storage with efficient eviction as new segments arrive
- Handle per-user playback position tracking with consideration for write amplification -- you cannot write to a database on every video segment request
- Address the transition from time-shifted playback back to live, including UI/UX for "skip to live" and catch-up speed modes
- Consider using client-side position tracking with periodic checkpoints to backend, falling back to last known position on device switches
4. Synchronized Metadata and Interactive Features
Live sports require tight synchronization between video content and metadata like scores, statistics, and interactive polls. Interviewers want to see how you handle the timing challenges when video has different latencies per viewer.
Hints to consider:
- Use a consistent timeline reference (like presentation timestamps) to tag both video segments and metadata events, allowing clients to synchronize locally
- Design a pub-sub system for pushing metadata updates to millions of viewers with acceptable eventual consistency
- Address the challenge that different viewers on different CDN edges may be watching segments from slightly different times due to caching
- Consider allowing metadata to be "ahead" of video for some viewers rather than attempting perfect synchronization, with graceful degradation when timing is off