Practice/Netflix/Home Page Video Recommendation

Home Page Video Recommendation

System DesignMust

Problem Statement

Design Netflix's home page video recommendation system. The system must personalize the video recommendations for each user, balancing relevance, diversity, and freshness. It should support real-time serving of personalized rankings and offline training of recommendation models.

This is an ML system design problem that combines recommendation algorithms, large-scale data processing, and low-latency serving infrastructure. You should be prepared to discuss both the ML components (candidate generation, ranking, feature engineering) and the system architecture (service decomposition, data pipelines, caching strategies).

Key Requirements

Functional

Personalized recommendations -- generate a ranked list of videos tailored to each user's preferences and viewing history
Candidate generation -- retrieve a diverse set of candidate videos from the catalog (millions of titles) that are likely relevant to the user
Ranking model -- score and rank candidates based on predicted engagement (watch probability, completion rate)
Diversity and exploration -- balance showing popular content with niche or new releases to avoid filter bubbles
Real-time serving -- return personalized recommendations within tens of milliseconds when the user loads the home page
A/B testing -- support experimentation with different models and ranking strategies

Non-Functional

Low latency -- end-to-end recommendation latency must be under 100ms to avoid degrading the user experience
High throughput -- handle millions of concurrent users requesting personalized recommendations
Scalability -- all components (candidate retrieval, model serving, feature store) should scale horizontally
High availability -- graceful degradation if recommendation service is slow or down (fallback to popular content)
Freshness -- incorporate recently released content and user interactions (recent watches, ratings) into recommendations

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Service Architecture and System Decomposition (Most Emphasized)

Interviewers expect you to break down the recommendation system into multiple services and explain how they interact. They care more about service-level design than the details of ML models.

Hints to consider:

Candidate Generation Service -- retrieves a diverse set of candidates (hundreds to thousands) using collaborative filtering, content-based methods, or retrieval models
Ranking Service -- applies a trained ML model to score and rank candidates based on predicted user engagement
Feature Service (Feature Store) -- provides real-time feature lookups (user history, video metadata, context features)
Content Service -- manages the video catalog and metadata (title, genre, cast, release date)
API Gateway -- orchestrates calls to candidate generation, feature retrieval, and ranking, then returns the final ranked list
A/B Testing Framework -- assigns users to experiment groups and logs impressions and outcomes
Discuss how services communicate (REST, gRPC) and where caching is applied to reduce latency

2. Candidate Generation Strategies

Interviewers want to see how you narrow down millions of videos to a manageable candidate set for ranking.

Hints to consider:

Collaborative filtering: users who watched video A also watched videos B, C, D (item-to-item similarity)
Content-based filtering: videos similar to those the user has watched (genre, cast, tags)
Two-tower models (user embedding and item embedding) for efficient retrieval
Approximate nearest neighbor search (FAISS, Annoy) for fast similarity lookups
Combine multiple candidate sources (collaborative, content-based, trending, new releases) and merge them

3. Ranking Model Design

Interviewers probe on how you score and rank the candidate set to predict user engagement.

Hints to consider:

Gradient-boosted trees (XGBoost, LightGBM) or deep learning models (wide and deep, two-tower)
Features: user watch history, video metadata, time of day, device type, user demographics
Training objective: predict watch probability, completion rate, or rating
Offline evaluation metrics: precision at k, recall at k, NDCG
Online evaluation: A/B test metrics (click-through rate, watch time, retention)

4. Feature Engineering and Real-Time Serving

Interviewers want to see how you compute and serve features with low latency.

Hints to consider:

Feature store with online (low-latency) and offline (batch) paths
Precompute user features (historical watch count, favorite genres) and cache them
Real-time context features (time of day, device type) computed on the fly
Video metadata features (genre, cast, release date) stored in a database or cache
Use Redis or DynamoDB for fast feature lookups

5. Scalability and Performance Optimization

Interviewers push on how you handle peak load and minimize latency.

Hints to consider:

Horizontal scaling of ranking service with load balancing
Caching of candidate generation results for popular user segments
Pre-warming cache with recommendations for active users before they visit the home page
Asynchronous batch updates of precomputed recommendations (computed overnight, served during the day)
Graceful degradation: fallback to popular or trending content if personalized service is slow

Suggested Approach

Step 1: Clarify Requirements

Ask about the scale (users, videos), latency budget, whether the interviewer wants to focus on the ML models or the system architecture, and whether you should emphasize candidate generation or ranking.

Step 2: High-Level Architecture

Sketch these core components:

API Gateway -- receives home page requests and orchestrates the recommendation pipeline
Candidate Generation Service -- retrieves a diverse set of candidates using collaborative filtering, content-based methods, or retrieval models
Ranking Service -- scores and ranks candidates using an ML model
Feature Store -- provides real-time user and video features for ranking
Content Metadata Service -- stores video metadata (title, genre, cast)
A/B Testing Service -- assigns users to experiments and logs outcomes
Offline Training Pipeline -- trains candidate retrieval and ranking models using historical user interaction data
Monitoring and Analytics -- tracks model performance, latency, and business metrics

Step 3: Deep Dive on Candidate Generation

Discuss how you retrieve candidates from a large catalog, the trade-off between recall and latency, and how you combine multiple retrieval strategies.

Step 4: Deep Dive on Ranking Model

Explain your model choice, feature set, training objective, and offline/online evaluation metrics. Walk through an example prediction.

Step 5: Scalability and Serving Infrastructure

Discuss how you scale each service, optimize for low latency, and handle peak load. Explain caching strategies and graceful degradation.

Real Interview Quotes

"Home page video recommendation. Spent more time discussing how many services are needed and how they interact. Didn't go deep into feature engineering, modeling, or evaluation — the interviewer cared more about the service-level architecture."

Practice/Netflix/Home Page Video Recommendation

Home Page Video Recommendation

System DesignMust

Problem Statement

Key Requirements

Functional

Personalized recommendations -- generate a ranked list of videos tailored to each user's preferences and viewing history
Candidate generation -- retrieve a diverse set of candidate videos from the catalog (millions of titles) that are likely relevant to the user
Ranking model -- score and rank candidates based on predicted engagement (watch probability, completion rate)
Diversity and exploration -- balance showing popular content with niche or new releases to avoid filter bubbles
Real-time serving -- return personalized recommendations within tens of milliseconds when the user loads the home page
A/B testing -- support experimentation with different models and ranking strategies

Non-Functional

Low latency -- end-to-end recommendation latency must be under 100ms to avoid degrading the user experience
High throughput -- handle millions of concurrent users requesting personalized recommendations
Scalability -- all components (candidate retrieval, model serving, feature store) should scale horizontally
High availability -- graceful degradation if recommendation service is slow or down (fallback to popular content)
Freshness -- incorporate recently released content and user interactions (recent watches, ratings) into recommendations

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Service Architecture and System Decomposition (Most Emphasized)

Interviewers expect you to break down the recommendation system into multiple services and explain how they interact. They care more about service-level design than the details of ML models.

Hints to consider:

Candidate Generation Service -- retrieves a diverse set of candidates (hundreds to thousands) using collaborative filtering, content-based methods, or retrieval models
Ranking Service -- applies a trained ML model to score and rank candidates based on predicted user engagement
Feature Service (Feature Store) -- provides real-time feature lookups (user history, video metadata, context features)
Content Service -- manages the video catalog and metadata (title, genre, cast, release date)
API Gateway -- orchestrates calls to candidate generation, feature retrieval, and ranking, then returns the final ranked list
A/B Testing Framework -- assigns users to experiment groups and logs impressions and outcomes
Discuss how services communicate (REST, gRPC) and where caching is applied to reduce latency

2. Candidate Generation Strategies

Interviewers want to see how you narrow down millions of videos to a manageable candidate set for ranking.

Hints to consider:

Collaborative filtering: users who watched video A also watched videos B, C, D (item-to-item similarity)
Content-based filtering: videos similar to those the user has watched (genre, cast, tags)
Two-tower models (user embedding and item embedding) for efficient retrieval
Approximate nearest neighbor search (FAISS, Annoy) for fast similarity lookups
Combine multiple candidate sources (collaborative, content-based, trending, new releases) and merge them

3. Ranking Model Design

Interviewers probe on how you score and rank the candidate set to predict user engagement.

Hints to consider:

Gradient-boosted trees (XGBoost, LightGBM) or deep learning models (wide and deep, two-tower)
Features: user watch history, video metadata, time of day, device type, user demographics
Training objective: predict watch probability, completion rate, or rating
Offline evaluation metrics: precision at k, recall at k, NDCG
Online evaluation: A/B test metrics (click-through rate, watch time, retention)

4. Feature Engineering and Real-Time Serving

Interviewers want to see how you compute and serve features with low latency.

Hints to consider:

Feature store with online (low-latency) and offline (batch) paths
Precompute user features (historical watch count, favorite genres) and cache them
Real-time context features (time of day, device type) computed on the fly
Video metadata features (genre, cast, release date) stored in a database or cache
Use Redis or DynamoDB for fast feature lookups

5. Scalability and Performance Optimization

Interviewers push on how you handle peak load and minimize latency.

Hints to consider:

Horizontal scaling of ranking service with load balancing
Caching of candidate generation results for popular user segments
Pre-warming cache with recommendations for active users before they visit the home page
Asynchronous batch updates of precomputed recommendations (computed overnight, served during the day)
Graceful degradation: fallback to popular or trending content if personalized service is slow

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Sketch these core components:

API Gateway -- receives home page requests and orchestrates the recommendation pipeline
Candidate Generation Service -- retrieves a diverse set of candidates using collaborative filtering, content-based methods, or retrieval models
Ranking Service -- scores and ranks candidates using an ML model
Feature Store -- provides real-time user and video features for ranking
Content Metadata Service -- stores video metadata (title, genre, cast)
A/B Testing Service -- assigns users to experiments and logs outcomes
Offline Training Pipeline -- trains candidate retrieval and ranking models using historical user interaction data
Monitoring and Analytics -- tracks model performance, latency, and business metrics

Step 3: Deep Dive on Candidate Generation

Discuss how you retrieve candidates from a large catalog, the trade-off between recall and latency, and how you combine multiple retrieval strategies.

Step 4: Deep Dive on Ranking Model

Explain your model choice, feature set, training objective, and offline/online evaluation metrics. Walk through an example prediction.

Step 5: Scalability and Serving Infrastructure

Discuss how you scale each service, optimize for low latency, and handle peak load. Explain caching strategies and graceful degradation.