Practice/Amazon/Design Netflix/Video Streaming Platform

Design Netflix/Video Streaming Platform

System DesignMust

Problem Statement

Design a video streaming platform like Netflix that supports video playback across multiple devices (mobile, web, smart TVs) with seamless resume functionality, subscription management, and personalized recommendations. The system must handle large video file uploads from content creators, transcode them into multiple quality levels for adaptive bitrate streaming, and deliver content globally with low latency through CDNs.

At Amazon, interviewers ask this to see if you can connect offline pipelines (upload and transcode), online low-latency serving (manifests, adaptive bitrate, CDNs), cross-device state (resume points), and recommendations into a coherent, scalable design. They test your ability to prioritize requirements and use industry patterns for high read volume, large blob delivery, metrics collection, and ML-driven ranking.

Key Requirements

Functional

Video upload and processing -- content creators upload video files with metadata; the system transcodes them into multiple resolutions and formats (HLS/DASH segments) asynchronously
Adaptive bitrate streaming -- viewers stream videos with quality that adapts to network conditions, with start, pause, and seek support
Cross-device resume -- users can pause on one device and resume from the exact timestamp on another
Personalized recommendations -- a homepage displays movies and shows tailored to the user's viewing history and preferences

Non-Functional

Scalability -- support hundreds of millions of subscribers streaming simultaneously, with traffic spikes during popular releases
Reliability -- maintain 99.9% uptime for playback; tolerate regional outages without viewer disruption
Latency -- video playback starts within 2 seconds; homepage loads within 300ms; seek operations respond within 500ms
Consistency -- eventual consistency acceptable for recommendations and watch history; strong consistency for subscription and payment state

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Video Processing Pipeline

Uploading and transcoding large video files into streamable formats is a multi-step, resource-intensive workflow. Interviewers want to see how you design a resilient, observable pipeline.

Hints to consider:

Accept uploads directly to object storage (S3) via pre-signed URLs, avoiding application server bottlenecks
Use a workflow orchestrator or saga pattern for the multi-step pipeline: validation, transcoding into multiple renditions (4K, 1080p, 720p, 480p), DRM wrapping, thumbnail generation, and metadata finalization
Leverage Kafka to trigger and coordinate pipeline stages with retries and dead-letter queues for failures
Store HLS/DASH manifest files alongside video segments in object storage, with CDN distribution on publish

2. Content Delivery and Adaptive Bitrate Streaming

Serving video segments to millions of concurrent viewers with low startup latency and smooth playback is the core serving challenge. Interviewers probe your CDN architecture and ABR strategy.

Hints to consider:

Design a multi-tier CDN architecture with edge PoPs, regional mid-tier caches, and origin shielding to minimize origin load
Explain how HLS/DASH works: the client fetches a manifest listing available quality levels, then requests segments at the appropriate bitrate based on measured bandwidth
Discuss segment size tradeoffs -- smaller segments reduce startup latency but increase HTTP overhead and reduce cache efficiency
Address pre-warming CDN caches before major content releases and handling cache stampedes with request coalescing

3. Watch Progress and Cross-Device State

Users expect to seamlessly switch devices and resume exactly where they left off. This creates a high-write state management challenge during popular viewing hours.

Hints to consider:

Buffer watch position updates on the client and flush to the server every 10-15 seconds to avoid per-second write amplification
Use an upsert-friendly store (DynamoDB or Redis) keyed by (user_id, content_id) for fast writes and reads
On device switch, the client fetches the latest position before starting playback
Implement idempotent upserts to handle retries safely and prevent stale position overwrites using timestamps

4. Recommendation System Architecture

Personalized homepages drive engagement but require ML model outputs. Interviewers want to see how you serve recommendations without adding latency to the critical path.

Hints to consider:

Pre-compute recommendation candidate sets offline using batch jobs (Spark) and store per-user rows in a fast cache (Redis) or database
At serve time, apply lightweight re-ranking (recency boost, watched-content filtering) to the pre-computed candidates
Cache entire homepage row configurations with short TTLs to serve most requests without computation
Implement graceful degradation: if the recommendation service is slow, fall back to trending or editorial curated content

Practice/Amazon/Design Netflix/Video Streaming Platform

Design Netflix/Video Streaming Platform

System DesignMust

Problem Statement

Key Requirements

Functional

Video upload and processing -- content creators upload video files with metadata; the system transcodes them into multiple resolutions and formats (HLS/DASH segments) asynchronously
Adaptive bitrate streaming -- viewers stream videos with quality that adapts to network conditions, with start, pause, and seek support
Cross-device resume -- users can pause on one device and resume from the exact timestamp on another
Personalized recommendations -- a homepage displays movies and shows tailored to the user's viewing history and preferences

Non-Functional

Scalability -- support hundreds of millions of subscribers streaming simultaneously, with traffic spikes during popular releases
Reliability -- maintain 99.9% uptime for playback; tolerate regional outages without viewer disruption
Latency -- video playback starts within 2 seconds; homepage loads within 300ms; seek operations respond within 500ms
Consistency -- eventual consistency acceptable for recommendations and watch history; strong consistency for subscription and payment state

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Video Processing Pipeline

Uploading and transcoding large video files into streamable formats is a multi-step, resource-intensive workflow. Interviewers want to see how you design a resilient, observable pipeline.

Hints to consider:

Accept uploads directly to object storage (S3) via pre-signed URLs, avoiding application server bottlenecks
Use a workflow orchestrator or saga pattern for the multi-step pipeline: validation, transcoding into multiple renditions (4K, 1080p, 720p, 480p), DRM wrapping, thumbnail generation, and metadata finalization
Leverage Kafka to trigger and coordinate pipeline stages with retries and dead-letter queues for failures
Store HLS/DASH manifest files alongside video segments in object storage, with CDN distribution on publish

2. Content Delivery and Adaptive Bitrate Streaming

Serving video segments to millions of concurrent viewers with low startup latency and smooth playback is the core serving challenge. Interviewers probe your CDN architecture and ABR strategy.

Hints to consider:

Design a multi-tier CDN architecture with edge PoPs, regional mid-tier caches, and origin shielding to minimize origin load
Explain how HLS/DASH works: the client fetches a manifest listing available quality levels, then requests segments at the appropriate bitrate based on measured bandwidth
Discuss segment size tradeoffs -- smaller segments reduce startup latency but increase HTTP overhead and reduce cache efficiency
Address pre-warming CDN caches before major content releases and handling cache stampedes with request coalescing

3. Watch Progress and Cross-Device State

Users expect to seamlessly switch devices and resume exactly where they left off. This creates a high-write state management challenge during popular viewing hours.

Hints to consider:

Buffer watch position updates on the client and flush to the server every 10-15 seconds to avoid per-second write amplification
Use an upsert-friendly store (DynamoDB or Redis) keyed by (user_id, content_id) for fast writes and reads
On device switch, the client fetches the latest position before starting playback
Implement idempotent upserts to handle retries safely and prevent stale position overwrites using timestamps

4. Recommendation System Architecture

Personalized homepages drive engagement but require ML model outputs. Interviewers want to see how you serve recommendations without adding latency to the critical path.

Hints to consider:

Pre-compute recommendation candidate sets offline using batch jobs (Spark) and store per-user rows in a fast cache (Redis) or database
At serve time, apply lightweight re-ranking (recency boost, watched-content filtering) to the pre-computed candidates
Cache entire homepage row configurations with short TTLs to serve most requests without computation
Implement graceful degradation: if the recommendation service is slow, fall back to trending or editorial curated content