Practice/Amazon/Design Netflix/Video Streaming Platform
Design Netflix/Video Streaming Platform
System DesignMust
Problem Statement
Design a video streaming platform like Netflix that supports video playback across multiple devices (mobile, web, smart TVs) with seamless resume functionality, subscription management, and personalized recommendations. The system must handle large video file uploads from content creators, transcode them into multiple quality levels for adaptive bitrate streaming, and deliver content globally with low latency through CDNs.
At Amazon, interviewers ask this to see if you can connect offline pipelines (upload and transcode), online low-latency serving (manifests, adaptive bitrate, CDNs), cross-device state (resume points), and recommendations into a coherent, scalable design. They test your ability to prioritize requirements and use industry patterns for high read volume, large blob delivery, metrics collection, and ML-driven ranking.
Key Requirements
Functional
- Video upload and processing -- content creators upload video files with metadata; the system transcodes them into multiple resolutions and formats (HLS/DASH segments) asynchronously
- Adaptive bitrate streaming -- viewers stream videos with quality that adapts to network conditions, with start, pause, and seek support
- Cross-device resume -- users can pause on one device and resume from the exact timestamp on another
- Personalized recommendations -- a homepage displays movies and shows tailored to the user's viewing history and preferences
Non-Functional
- Scalability -- support hundreds of millions of subscribers streaming simultaneously, with traffic spikes during popular releases
- Reliability -- maintain 99.9% uptime for playback; tolerate regional outages without viewer disruption
- Latency -- video playback starts within 2 seconds; homepage loads within 300ms; seek operations respond within 500ms
- Consistency -- eventual consistency acceptable for recommendations and watch history; strong consistency for subscription and payment state
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Video Processing Pipeline
Uploading and transcoding large video files into streamable formats is a multi-step, resource-intensive workflow. Interviewers want to see how you design a resilient, observable pipeline.
Hints to consider:
- Accept uploads directly to object storage (S3) via pre-signed URLs, avoiding application server bottlenecks
- Use a workflow orchestrator or saga pattern for the multi-step pipeline: validation, transcoding into multiple renditions (4K, 1080p, 720p, 480p), DRM wrapping, thumbnail generation, and metadata finalization
- Leverage Kafka to trigger and coordinate pipeline stages with retries and dead-letter queues for failures
- Store HLS/DASH manifest files alongside video segments in object storage, with CDN distribution on publish
2. Content Delivery and Adaptive Bitrate Streaming
Serving video segments to millions of concurrent viewers with low startup latency and smooth playback is the core serving challenge. Interviewers probe your CDN architecture and ABR strategy.
Hints to consider:
- Design a multi-tier CDN architecture with edge PoPs, regional mid-tier caches, and origin shielding to minimize origin load
- Explain how HLS/DASH works: the client fetches a manifest listing available quality levels, then requests segments at the appropriate bitrate based on measured bandwidth
- Discuss segment size tradeoffs -- smaller segments reduce startup latency but increase HTTP overhead and reduce cache efficiency
- Address pre-warming CDN caches before major content releases and handling cache stampedes with request coalescing
3. Watch Progress and Cross-Device State
Users expect to seamlessly switch devices and resume exactly where they left off. This creates a high-write state management challenge during popular viewing hours.
Hints to consider:
- Buffer watch position updates on the client and flush to the server every 10-15 seconds to avoid per-second write amplification
- Use an upsert-friendly store (DynamoDB or Redis) keyed by (user_id, content_id) for fast writes and reads
- On device switch, the client fetches the latest position before starting playback
- Implement idempotent upserts to handle retries safely and prevent stale position overwrites using timestamps
4. Recommendation System Architecture
Personalized homepages drive engagement but require ML model outputs. Interviewers want to see how you serve recommendations without adding latency to the critical path.
Hints to consider:
- Pre-compute recommendation candidate sets offline using batch jobs (Spark) and store per-user rows in a fast cache (Redis) or database
- At serve time, apply lightweight re-ranking (recency boost, watched-content filtering) to the pre-computed candidates
- Cache entire homepage row configurations with short TTLs to serve most requests without computation
- Implement graceful degradation: if the recommendation service is slow, fall back to trending or editorial curated content