Practice/MongoDB/Design Netflix/Video Streaming Platform

Design Netflix/Video Streaming Platform

System DesignMust

Problem Statement

Design a video streaming platform like Netflix that supports video playback across multiple devices (mobile, web, smart TVs) with seamless resume functionality, subscription management, personalized recommendations, and content upload capabilities. Users expect to press play and see video within two seconds, switch between devices without losing their place, and browse a homepage tailored to their viewing history.

The system ingests raw video files from content partners, transcodes them into dozens of resolution and bitrate variants for adaptive streaming, and distributes segments through a global CDN. Behind the browsing experience sits a recommendation engine that ranks thousands of titles per user, updated frequently based on watch behavior. At MongoDB interview scale, the interviewer reported focusing heavily on how large movie files are stored and processed rather than on the streaming delivery path alone, so be prepared to go deep on the upload, transcoding, and storage architecture in addition to the playback pipeline.

Key Requirements

Functional

Video upload and processing -- content partners upload raw video files that are transcoded into multiple resolutions (480p through 4K) and packaged as HLS or DASH segments for adaptive bitrate streaming
Playback with resume -- users stream video with adaptive quality switching and can pause on one device and resume from the exact timestamp on another
Personalized homepage -- users see a curated set of recommended titles ranked by their viewing history, preferences, and trending signals
Subscription gating -- playback access is controlled by subscription tier, with entitlement checks enforced at stream initiation

Non-Functional

Scalability -- support hundreds of millions of subscribers with tens of millions of concurrent streams during peak evening hours
Latency -- achieve video start times under 2 seconds and rebuffering rates below 1 percent through effective CDN caching and segment sizing
Reliability -- maintain 99.9 percent streaming availability with redundant encoding pipelines and multi-region CDN failover
Cost efficiency -- optimize storage through compression and tiered archival, and minimize bandwidth costs with cache-friendly content distribution

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Video Storage and Processing Pipeline

MongoDB interviewers have specifically emphasized how large movie files are stored, more so than the streaming delivery itself. You need to demonstrate a clear understanding of the full lifecycle from raw upload to playable segments.

Hints to consider:

Accept uploads via resumable multipart upload to object storage (S3), with each upload tracked by a metadata record in a relational or document database
Trigger an asynchronous transcoding pipeline through a message queue (Kafka or SQS) that fans out to GPU-accelerated encoding workers producing multiple renditions in parallel
Store transcoded segments in object storage organized by title ID, quality level, and segment number for cache-friendly, content-addressable URLs
Implement checkpointing within transcoding jobs so that a worker failure resumes from the last completed segment rather than restarting the entire file

2. Content Delivery and Adaptive Bitrate Streaming

Serving petabytes of video content globally with minimal buffering requires a multi-tier distribution strategy. Interviewers expect you to go beyond "use a CDN" and explain cache hierarchies, segment sizing, and origin shielding.

Hints to consider:

Design a three-tier distribution: edge CDN nodes serve hot content, regional mid-tier caches absorb misses and shield the origin, and origin servers pull from object storage on cold requests
Use HLS or DASH with 2-6 second segment durations, balancing startup latency (shorter segments) against HTTP overhead and cache efficiency (longer segments)
Pre-warm CDN caches for new high-profile releases by pushing segments to edge locations before the title becomes available
Implement origin shielding so that thousands of edge nodes requesting the same new segment funnel through a single regional cache rather than all hitting the origin simultaneously

3. Watch Progress and Cross-Device Resume

Tracking playback position for every user across every title they watch creates a high-frequency write workload. Interviewers want to see how you handle this without overloading your database or creating hot partitions.

Hints to consider:

Buffer playback position updates on the client and flush to the server every 10-30 seconds rather than on every frame, dramatically reducing write volume
Store progress in a key-value or wide-column store (DynamoDB, Cassandra) keyed by (user_id, title_id) for fast point lookups with predictable latency
Use conditional writes or upserts to ensure that concurrent updates from multiple devices converge on the most recent timestamp
Cache the active viewing session's progress in Redis with a short TTL so device switches see the latest position without querying the persistent store

4. Recommendation Engine Architecture

Serving personalized recommendations for millions of users requires separating expensive ML computation from the low-latency read path. Interviewers assess whether you understand offline versus online ranking tradeoffs.

Hints to consider:

Run batch candidate generation offline (hourly or daily) using collaborative filtering or embedding-based models, storing per-user candidate lists in a fast-read store
Apply a lightweight online re-ranking layer at request time that incorporates real-time signals (time of day, recently watched, trending titles) without running heavy ML inference
Cache precomputed homepage rows in Redis or a similar store with TTLs, falling back gracefully to a generic trending list if the personalized cache misses
Separate the recommendation pipeline from the serving path so a model training failure never impacts homepage availability

Practice/MongoDB/Design Netflix/Video Streaming Platform

Design Netflix/Video Streaming Platform

System DesignMust

Problem Statement

Key Requirements

Functional

Video upload and processing -- content partners upload raw video files that are transcoded into multiple resolutions (480p through 4K) and packaged as HLS or DASH segments for adaptive bitrate streaming
Playback with resume -- users stream video with adaptive quality switching and can pause on one device and resume from the exact timestamp on another
Personalized homepage -- users see a curated set of recommended titles ranked by their viewing history, preferences, and trending signals
Subscription gating -- playback access is controlled by subscription tier, with entitlement checks enforced at stream initiation

Non-Functional

Scalability -- support hundreds of millions of subscribers with tens of millions of concurrent streams during peak evening hours
Latency -- achieve video start times under 2 seconds and rebuffering rates below 1 percent through effective CDN caching and segment sizing
Reliability -- maintain 99.9 percent streaming availability with redundant encoding pipelines and multi-region CDN failover
Cost efficiency -- optimize storage through compression and tiered archival, and minimize bandwidth costs with cache-friendly content distribution

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Video Storage and Processing Pipeline

Hints to consider:

Accept uploads via resumable multipart upload to object storage (S3), with each upload tracked by a metadata record in a relational or document database
Trigger an asynchronous transcoding pipeline through a message queue (Kafka or SQS) that fans out to GPU-accelerated encoding workers producing multiple renditions in parallel
Store transcoded segments in object storage organized by title ID, quality level, and segment number for cache-friendly, content-addressable URLs
Implement checkpointing within transcoding jobs so that a worker failure resumes from the last completed segment rather than restarting the entire file

2. Content Delivery and Adaptive Bitrate Streaming

Hints to consider:

Design a three-tier distribution: edge CDN nodes serve hot content, regional mid-tier caches absorb misses and shield the origin, and origin servers pull from object storage on cold requests
Use HLS or DASH with 2-6 second segment durations, balancing startup latency (shorter segments) against HTTP overhead and cache efficiency (longer segments)
Pre-warm CDN caches for new high-profile releases by pushing segments to edge locations before the title becomes available
Implement origin shielding so that thousands of edge nodes requesting the same new segment funnel through a single regional cache rather than all hitting the origin simultaneously

3. Watch Progress and Cross-Device Resume

Hints to consider:

Buffer playback position updates on the client and flush to the server every 10-30 seconds rather than on every frame, dramatically reducing write volume
Store progress in a key-value or wide-column store (DynamoDB, Cassandra) keyed by (user_id, title_id) for fast point lookups with predictable latency
Use conditional writes or upserts to ensure that concurrent updates from multiple devices converge on the most recent timestamp
Cache the active viewing session's progress in Redis with a short TTL so device switches see the latest position without querying the persistent store

4. Recommendation Engine Architecture

Hints to consider:

Run batch candidate generation offline (hourly or daily) using collaborative filtering or embedding-based models, storing per-user candidate lists in a fast-read store
Apply a lightweight online re-ranking layer at request time that incorporates real-time signals (time of day, recently watched, trending titles) without running heavy ML inference
Cache precomputed homepage rows in Redis or a similar store with TTLs, falling back gracefully to a generic trending list if the personalized cache misses
Separate the recommendation pipeline from the serving path so a model training failure never impacts homepage availability