Design Netflix/Video Streaming Platform — HubSpot

Reference Answer

There is no exact matching guide in the learning catalog for this question, but related material covers overlapping patterns.

Review the Content Delivery Networks (CDN), Blob Storage, and Caching building blocks for background on multi-tier content distribution, large binary object management, and low-latency serving strategies central to a video streaming platform.

Problem Statement

Design a video streaming platform like Netflix that supports video playback across multiple devices (mobile, web, smart TVs) with seamless resume functionality, subscription management, personalized recommendations, and content upload capabilities. Users expect to press play and see video within two seconds, switch between devices without losing their place, and browse a homepage tailored to their viewing history.

The system ingests raw video files from content partners, transcodes them into dozens of resolution and bitrate variants for adaptive streaming, and distributes segments through a global CDN. Behind the browsing experience sits a recommendation engine that ranks thousands of titles per user, updated frequently based on watch behavior. At HubSpot interview scale, interviewers have asked this as an open-ended problem where the full lifecycle from upload through delivery must be addressed within a 40-minute session, so be prepared to prioritize ruthlessly and go deep on the areas the interviewer cares about most.

Key Requirements

Functional

Video upload and processing -- content partners upload raw video files that are transcoded into multiple resolutions (480p through 4K) and packaged as HLS or DASH segments for adaptive bitrate streaming
Playback with resume -- users stream video with adaptive quality switching and can pause on one device and resume from the exact timestamp on another
Personalized homepage -- users see a curated set of recommended titles ranked by their viewing history, preferences, and trending signals
Subscription gating -- playback access is controlled by subscription tier, with entitlement checks enforced at stream initiation

Non-Functional

Scalability -- support hundreds of millions of subscribers with tens of millions of concurrent streams during peak evening hours
Latency -- achieve video start times under 2 seconds and rebuffering rates below 1 percent through effective CDN caching and segment sizing
Reliability -- maintain 99.9 percent streaming availability with redundant encoding pipelines and multi-region CDN failover
Cost efficiency -- optimize storage through compression and tiered archival, and minimize bandwidth costs with cache-friendly content distribution

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Video Storage and Processing Pipeline

Interviewers want to see how large movie files are stored and processed through the transcoding pipeline. You need to demonstrate a clear understanding of the full lifecycle from raw upload to playable segments, not just handwave at "use S3."

Hints to consider:

Accept uploads via resumable multipart upload to object storage (S3), with each upload tracked by a metadata record in a relational or document database
Trigger an asynchronous transcoding pipeline through a message queue (Kafka or SQS) that fans out to GPU-accelerated encoding workers producing multiple renditions in parallel
Store transcoded segments in object storage organized by title ID, quality level, and segment number for cache-friendly, content-addressable URLs
Implement checkpointing within transcoding jobs so that a worker failure resumes from the last completed segment rather than restarting the entire file

2. Content Delivery and Adaptive Bitrate Streaming

Serving petabytes of video content globally with minimal buffering requires a multi-tier distribution strategy. Interviewers expect you to go beyond "use a CDN" and explain cache hierarchies, segment sizing, and origin shielding.

Hints to consider:

Design a three-tier distribution: edge CDN nodes serve hot content, regional mid-tier caches absorb misses and shield the origin, and origin servers pull from object storage on cold requests
Use HLS or DASH with 2-6 second segment durations, balancing startup latency (shorter segments) against HTTP overhead and cache efficiency (longer segments)
Pre-warm CDN caches for new high-profile releases by pushing segments to edge locations before the title becomes available
Implement origin shielding so that thousands of edge nodes requesting the same new segment funnel through a single regional cache rather than all hitting the origin simultaneously

3. Watch Progress and Cross-Device Resume

Tracking playback position for every user across every title they watch creates a high-frequency write workload. Interviewers want to see how you handle this without overloading your database or creating hot partitions.

Hints to consider:

Buffer playback position updates on the client and flush to the server every 10-30 seconds rather than on every frame, dramatically reducing write volume
Store progress in a key-value or wide-column store (DynamoDB, Cassandra) keyed by (user_id, title_id) for fast point lookups with predictable latency
Use conditional writes or upserts to ensure that concurrent updates from multiple devices converge on the most recent timestamp
Cache the active viewing session's progress in Redis with a short TTL so device switches see the latest position without querying the persistent store

4. Recommendation Engine Architecture

Serving personalized recommendations for millions of users requires separating expensive ML computation from the low-latency read path. Interviewers assess whether you understand offline versus online ranking tradeoffs.

Hints to consider:

Run batch candidate generation offline (hourly or daily) using collaborative filtering or embedding-based models, storing per-user candidate lists in a fast-read store
Apply a lightweight online re-ranking layer at request time that incorporates real-time signals (time of day, recently watched, trending titles) without running heavy ML inference
Cache precomputed homepage rows in Redis or a similar store with TTLs, falling back gracefully to a generic trending list if the personalized cache misses
Separate the recommendation pipeline from the serving path so a model training failure never impacts homepage availability

Suggested Approach

Step 1: Clarify Requirements

Confirm the scope with the interviewer. Ask whether live streaming is included or only video-on-demand. Clarify the expected catalog size (hundreds of thousands of titles versus millions), the number of concurrent streams during peak hours, and geographic distribution. Determine whether DRM and content protection are in scope. Ask about subscription tier complexity and whether ad-supported tiers with different playback rules are relevant. Establish latency targets for video start time and acceptable rebuffering rate.

Step 2: High-Level Architecture

Sketch the main components: a content ingestion service accepting uploads via presigned URLs to S3, an asynchronous transcoding pipeline orchestrated through Kafka with GPU worker pools, a metadata service (PostgreSQL or MongoDB) tracking titles, renditions, and processing status, object storage for segments, a multi-tier CDN for global delivery, a playback service that generates signed manifests and validates entitlements, a watch progress service backed by DynamoDB or Cassandra, a recommendation service with offline batch computation and online re-ranking, and client applications implementing adaptive bitrate players. Trace the upload-to-playback flow: partner uploads raw file, transcoding pipeline produces segments, CDN warms popular content, user requests manifest, player fetches segments from nearest edge.

Step 3: Deep Dive on Storage and Transcoding

Walk through the video processing lifecycle in detail. A content partner initiates a multipart upload to S3 and registers the title metadata in the catalog service. On upload completion, an event triggers the transcoding orchestrator, which splits the source into chunks, dispatches each chunk to GPU-accelerated workers across multiple quality levels in parallel, and reassembles the output into HLS segment files and manifest playlists. Each worker reports progress to the orchestrator, which checkpoints completed segments to enable retry from the last successful point. Once all renditions are complete, the orchestrator updates the catalog to mark the title as playable and optionally triggers CDN pre-warming for high-profile releases. Discuss storage layout: segments organized by /titles/{id}/renditions/{quality}/segments/{number}.ts for cache-friendly URLs and efficient CDN purging.

Step 4: Address Secondary Concerns

Cover playback: the client requests a manifest from the playback service, which validates the subscription entitlement, generates a signed manifest URL with a short-lived token, and returns it. The player fetches segments from the CDN, adapting quality based on measured bandwidth. Discuss watch progress: the client buffers position updates and flushes every 15 seconds to the progress service, which upserts into DynamoDB. On device switch, the client fetches the latest position before starting playback. Address recommendations: offline jobs compute per-user candidate sets nightly, store them in Redis, and the homepage API re-ranks and serves cached rows. Cover monitoring: track CDN cache hit ratios, segment fetch latency percentiles, transcoding queue depth, and rebuffering rates per region. Discuss cost optimization: archive infrequently accessed titles to cheaper storage tiers and delete unused renditions for content that is removed from the catalog.

Related Learning

Deepen your understanding of the patterns used in this problem:

Content Delivery Networks (CDN) -- multi-tier caching, origin shielding, and edge distribution are central to low-latency video delivery
Blob Storage -- object storage patterns for managing petabytes of video segments with content-addressable URLs
Message Queues -- Kafka orchestrates the transcoding pipeline and decouples upload events from processing workers
Caching -- Redis caches watch progress, recommendation rows, and manifest responses for sub-millisecond reads

Reference Answer

There is no exact matching guide in the learning catalog for this question, but related material covers overlapping patterns.

Problem Statement

Key Requirements

Functional

Video upload and processing -- content partners upload raw video files that are transcoded into multiple resolutions (480p through 4K) and packaged as HLS or DASH segments for adaptive bitrate streaming
Playback with resume -- users stream video with adaptive quality switching and can pause on one device and resume from the exact timestamp on another
Personalized homepage -- users see a curated set of recommended titles ranked by their viewing history, preferences, and trending signals
Subscription gating -- playback access is controlled by subscription tier, with entitlement checks enforced at stream initiation

Non-Functional

Scalability -- support hundreds of millions of subscribers with tens of millions of concurrent streams during peak evening hours
Latency -- achieve video start times under 2 seconds and rebuffering rates below 1 percent through effective CDN caching and segment sizing
Reliability -- maintain 99.9 percent streaming availability with redundant encoding pipelines and multi-region CDN failover
Cost efficiency -- optimize storage through compression and tiered archival, and minimize bandwidth costs with cache-friendly content distribution

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Video Storage and Processing Pipeline

Hints to consider:

Accept uploads via resumable multipart upload to object storage (S3), with each upload tracked by a metadata record in a relational or document database
Trigger an asynchronous transcoding pipeline through a message queue (Kafka or SQS) that fans out to GPU-accelerated encoding workers producing multiple renditions in parallel
Store transcoded segments in object storage organized by title ID, quality level, and segment number for cache-friendly, content-addressable URLs
Implement checkpointing within transcoding jobs so that a worker failure resumes from the last completed segment rather than restarting the entire file

2. Content Delivery and Adaptive Bitrate Streaming

Hints to consider:

Design a three-tier distribution: edge CDN nodes serve hot content, regional mid-tier caches absorb misses and shield the origin, and origin servers pull from object storage on cold requests
Use HLS or DASH with 2-6 second segment durations, balancing startup latency (shorter segments) against HTTP overhead and cache efficiency (longer segments)
Pre-warm CDN caches for new high-profile releases by pushing segments to edge locations before the title becomes available
Implement origin shielding so that thousands of edge nodes requesting the same new segment funnel through a single regional cache rather than all hitting the origin simultaneously

3. Watch Progress and Cross-Device Resume

Hints to consider:

Buffer playback position updates on the client and flush to the server every 10-30 seconds rather than on every frame, dramatically reducing write volume
Store progress in a key-value or wide-column store (DynamoDB, Cassandra) keyed by (user_id, title_id) for fast point lookups with predictable latency
Use conditional writes or upserts to ensure that concurrent updates from multiple devices converge on the most recent timestamp
Cache the active viewing session's progress in Redis with a short TTL so device switches see the latest position without querying the persistent store

4. Recommendation Engine Architecture

Hints to consider:

Run batch candidate generation offline (hourly or daily) using collaborative filtering or embedding-based models, storing per-user candidate lists in a fast-read store
Apply a lightweight online re-ranking layer at request time that incorporates real-time signals (time of day, recently watched, trending titles) without running heavy ML inference
Cache precomputed homepage rows in Redis or a similar store with TTLs, falling back gracefully to a generic trending list if the personalized cache misses
Separate the recommendation pipeline from the serving path so a model training failure never impacts homepage availability

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Storage and Transcoding

Step 4: Address Secondary Concerns

Related Learning

Deepen your understanding of the patterns used in this problem:

Content Delivery Networks (CDN) -- multi-tier caching, origin shielding, and edge distribution are central to low-latency video delivery
Blob Storage -- object storage patterns for managing petabytes of video segments with content-addressable URLs
Message Queues -- Kafka orchestrates the transcoding pipeline and decouples upload events from processing workers
Caching -- Redis caches watch progress, recommendation rows, and manifest responses for sub-millisecond reads