Practice/Snapchat/Design YouTube
Design YouTube
System DesignMust
Problem Statement
Design an on-demand video streaming platform where creators upload videos and millions of viewers watch them across phones, tablets, laptops, and smart TVs. The system must handle the full lifecycle: a creator records a video, uploads a potentially multi-gigabyte file, the platform transcodes it into multiple resolutions and formats, and viewers can then stream it with smooth playback, instant seeking, and adaptive quality that adjusts to their network conditions.
The core engineering challenge splits into two distinct paths. The write path must reliably accept large uploads (including resumable uploads for spotty connections), orchestrate a multi-step transcoding pipeline that produces HLS or DASH segments at several bitrate levels, generate thumbnails, and update a metadata catalog -- all asynchronously. The read path must serve video segments to a global audience where a viral video can spike from zero to millions of concurrent viewers in minutes, requiring aggressive CDN caching and origin shielding to prevent backend collapse.
Beyond media delivery, the platform needs a discovery layer -- search, recommendations, and trending lists -- that helps viewers find content, plus a creator dashboard showing upload progress, processing status, and view analytics. Balancing storage costs, transcoding compute, CDN bandwidth, and metadata query performance at YouTube-like scale is what makes this problem rich for a system design interview.
Key Requirements
Functional
- Video upload -- creators can upload large files with resumable multipart transfer, track progress, and retry failed chunks without restarting from scratch
- Transcoding pipeline -- the platform asynchronously converts uploaded videos into multiple resolutions (1080p, 720p, 480p, 360p) and packaging formats (HLS, DASH) with thumbnail generation
- On-demand streaming -- viewers can play any published video with adaptive bitrate switching, instant seek to any timestamp, and smooth playback across devices
- Content discovery -- viewers can search by title, tags, and description, browse categories, and receive personalized recommendations based on watch history
- Creator management -- creators can set titles, descriptions, thumbnails, and visibility (public, unlisted, private) and monitor processing status and view counts
Non-Functional
- Scalability -- support hundreds of millions of daily active viewers with peak concurrent streams exceeding 10 million, and thousands of simultaneous uploads
- Latency -- video playback startup under 2 seconds, seek operations under 500 milliseconds, and search results under 200 milliseconds
- Reliability -- no uploaded video is lost even if a transcoding worker crashes mid-job; every upload eventually completes processing
- Availability -- the streaming path maintains 99.99 percent uptime; degraded mode serves cached content even during origin outages
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Upload and Transcoding Workflow
Large video files cannot be processed synchronously. Interviewers want to see how you design an asynchronous, fault-tolerant pipeline that takes a raw upload through validation, transcoding, packaging, and publishing without losing data or leaving videos stuck in a broken state.
Hints to consider:
- Use pre-signed URLs so clients upload directly to object storage (S3), bypassing application servers entirely and avoiding bandwidth bottlenecks
- Implement resumable uploads using multipart upload APIs where each chunk is independently verified, allowing retry of individual failed parts
- Model the transcoding workflow as a state machine with durable state (in a database or a workflow engine like Temporal) so that a crashed worker can be picked up by another
- Produce multiple output renditions in parallel across a pool of GPU-backed transcoding workers, with a final assembly step that generates the HLS/DASH manifest
2. CDN Strategy and Hot-Key Mitigation
A single viral video can attract millions of simultaneous requests. Without careful CDN design, origin servers buckle under the load. Interviewers probe whether you understand caching layers, origin shielding, and thundering herd mitigation.
Hints to consider:
- Deploy an origin shield tier between your storage and edge CDN nodes so that even on a cache miss, only one request reaches the origin per segment
- Use consistent cache keys based on video ID, resolution, and segment number to maximize CDN hit rates across edge locations
- Implement request coalescing at the CDN layer so that thousands of simultaneous requests for the same new segment are collapsed into a single origin fetch
- Pre-warm CDN edges for videos trending on the recommendation or search surfaces before traffic spikes hit
3. Data Model and Metadata Management
The metadata layer must support fast lookups by video ID for the player, full-text search for discovery, and creator-facing queries for dashboards. Interviewers want to see how you separate concerns across different storage engines.
Hints to consider:
- Store video metadata (title, description, status, URLs) in a horizontally scalable database like DynamoDB partitioned by video ID for single-key lookups
- Feed metadata changes into Elasticsearch via change data capture for full-text search, autocomplete, and faceted filtering
- Cache hot video metadata in Redis with TTLs to absorb read spikes and reduce database load during viral events
- Keep view counters separate from core metadata to avoid write contention; use sharded counters or a stream-processing pipeline for count aggregation
4. Adaptive Bitrate Streaming and Playback
Smooth playback across varying network conditions and devices is central to user experience. Interviewers check whether you understand how ABR protocols work and how segment size affects latency and quality.
Hints to consider:
- Generate an HLS or DASH manifest that lists available renditions and segment URLs, letting the client player decide which quality level to request based on measured bandwidth
- Choose segment durations of 4-6 seconds as a balance between startup latency (shorter is better) and compression efficiency (longer is better)
- Support byte-range requests within segments for fast seeking without downloading entire segments
- Include I-frame-only playlists for trick play (fast forward, rewind) so the player can show keyframes while scrubbing