Practice/Netflix/Home Page Video Recommendation
Home Page Video Recommendation
System DesignMust
Problem Statement
Design Netflix's home page video recommendation system. The system must personalize the video recommendations for each user, balancing relevance, diversity, and freshness. It should support real-time serving of personalized rankings and offline training of recommendation models.
This is an ML system design problem that combines recommendation algorithms, large-scale data processing, and low-latency serving infrastructure. You should be prepared to discuss both the ML components (candidate generation, ranking, feature engineering) and the system architecture (service decomposition, data pipelines, caching strategies).
Key Requirements
Functional
- Personalized recommendations -- generate a ranked list of videos tailored to each user's preferences and viewing history
- Candidate generation -- retrieve a diverse set of candidate videos from the catalog (millions of titles) that are likely relevant to the user
- Ranking model -- score and rank candidates based on predicted engagement (watch probability, completion rate)
- Diversity and exploration -- balance showing popular content with niche or new releases to avoid filter bubbles
- Real-time serving -- return personalized recommendations within tens of milliseconds when the user loads the home page
- A/B testing -- support experimentation with different models and ranking strategies
Non-Functional
- Low latency -- end-to-end recommendation latency must be under 100ms to avoid degrading the user experience
- High throughput -- handle millions of concurrent users requesting personalized recommendations
- Scalability -- all components (candidate retrieval, model serving, feature store) should scale horizontally
- High availability -- graceful degradation if recommendation service is slow or down (fallback to popular content)
- Freshness -- incorporate recently released content and user interactions (recent watches, ratings) into recommendations
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Service Architecture and System Decomposition (Most Emphasized)
Interviewers expect you to break down the recommendation system into multiple services and explain how they interact. They care more about service-level design than the details of ML models.
Hints to consider:
- Candidate Generation Service -- retrieves a diverse set of candidates (hundreds to thousands) using collaborative filtering, content-based methods, or retrieval models
- Ranking Service -- applies a trained ML model to score and rank candidates based on predicted user engagement
- Feature Service (Feature Store) -- provides real-time feature lookups (user history, video metadata, context features)
- Content Service -- manages the video catalog and metadata (title, genre, cast, release date)
- API Gateway -- orchestrates calls to candidate generation, feature retrieval, and ranking, then returns the final ranked list
- A/B Testing Framework -- assigns users to experiment groups and logs impressions and outcomes
- Discuss how services communicate (REST, gRPC) and where caching is applied to reduce latency
2. Candidate Generation Strategies
Interviewers want to see how you narrow down millions of videos to a manageable candidate set for ranking.
Hints to consider:
- Collaborative filtering: users who watched video A also watched videos B, C, D (item-to-item similarity)
- Content-based filtering: videos similar to those the user has watched (genre, cast, tags)
- Two-tower models (user embedding and item embedding) for efficient retrieval
- Approximate nearest neighbor search (FAISS, Annoy) for fast similarity lookups
- Combine multiple candidate sources (collaborative, content-based, trending, new releases) and merge them
3. Ranking Model Design
Interviewers probe on how you score and rank the candidate set to predict user engagement.
Hints to consider:
- Gradient-boosted trees (XGBoost, LightGBM) or deep learning models (wide and deep, two-tower)
- Features: user watch history, video metadata, time of day, device type, user demographics
- Training objective: predict watch probability, completion rate, or rating
- Offline evaluation metrics: precision at k, recall at k, NDCG
- Online evaluation: A/B test metrics (click-through rate, watch time, retention)
4. Feature Engineering and Real-Time Serving
Interviewers want to see how you compute and serve features with low latency.
Hints to consider:
- Feature store with online (low-latency) and offline (batch) paths
- Precompute user features (historical watch count, favorite genres) and cache them
- Real-time context features (time of day, device type) computed on the fly
- Video metadata features (genre, cast, release date) stored in a database or cache
- Use Redis or DynamoDB for fast feature lookups
5. Scalability and Performance Optimization
Interviewers push on how you handle peak load and minimize latency.
Hints to consider:
- Horizontal scaling of ranking service with load balancing
- Caching of candidate generation results for popular user segments
- Pre-warming cache with recommendations for active users before they visit the home page
- Asynchronous batch updates of precomputed recommendations (computed overnight, served during the day)
- Graceful degradation: fallback to popular or trending content if personalized service is slow
Suggested Approach
Step 1: Clarify Requirements
Ask about the scale (users, videos), latency budget, whether the interviewer wants to focus on the ML models or the system architecture, and whether you should emphasize candidate generation or ranking.
Step 2: High-Level Architecture
Sketch these core components:
- API Gateway -- receives home page requests and orchestrates the recommendation pipeline
- Candidate Generation Service -- retrieves a diverse set of candidates using collaborative filtering, content-based methods, or retrieval models
- Ranking Service -- scores and ranks candidates using an ML model
- Feature Store -- provides real-time user and video features for ranking
- Content Metadata Service -- stores video metadata (title, genre, cast)
- A/B Testing Service -- assigns users to experiments and logs outcomes
- Offline Training Pipeline -- trains candidate retrieval and ranking models using historical user interaction data
- Monitoring and Analytics -- tracks model performance, latency, and business metrics
Step 3: Deep Dive on Candidate Generation
Discuss how you retrieve candidates from a large catalog, the trade-off between recall and latency, and how you combine multiple retrieval strategies.
Step 4: Deep Dive on Ranking Model
Explain your model choice, feature set, training objective, and offline/online evaluation metrics. Walk through an example prediction.
Step 5: Scalability and Serving Infrastructure
Discuss how you scale each service, optimize for low latency, and handle peak load. Explain caching strategies and graceful degradation.
Real Interview Quotes
"Home page video recommendation. Spent more time discussing how many services are needed and how they interact. Didn't go deep into feature engineering, modeling, or evaluation — the interviewer cared more about the service-level architecture."