Design Netflix's home page video recommendation system. The system must personalize the video recommendations for each user, balancing relevance, diversity, and freshness. It should support real-time serving of personalized rankings and offline training of recommendation models.
This is an ML system design problem that combines recommendation algorithms, large-scale data processing, and low-latency serving infrastructure. You should be prepared to discuss both the ML components (candidate generation, ranking, feature engineering) and the system architecture (service decomposition, data pipelines, caching strategies).
Personalized recommendations -- generate a ranked list of videos tailored to each user's preferences and viewing history
Candidate generation -- retrieve a diverse set of candidate videos from the catalog (millions of titles) that are likely relevant to the user
Ranking model -- score and rank candidates based on predicted engagement (watch probability, completion rate)
Diversity and exploration -- balance showing popular content with niche or new releases to avoid filter bubbles
Real-time serving -- return personalized recommendations within tens of milliseconds when the user loads the home page
A/B testing -- support experimentation with different models and ranking strategies
Low latency -- end-to-end recommendation latency must be under 100ms to avoid degrading the user experience
High throughput -- handle millions of concurrent users requesting personalized recommendations
Scalability -- all components (candidate retrieval, model serving, feature store) should scale horizontally
High availability -- graceful degradation if recommendation service is slow or down (fallback to popular content)
Freshness -- incorporate recently released content and user interactions (recent watches, ratings) into recommendations
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers expect you to break down the recommendation system into multiple services and explain how they interact. They care more about service-level design than the details of ML models.
Candidate Generation Service -- retrieves a diverse set of candidates (hundreds to thousands) using collaborative filtering, content-based methods, or retrieval models
Ranking Service -- applies a trained ML model to score and rank candidates based on predicted user engagement
Feature Service (Feature Store) -- provides real-time feature lookups (user history, video metadata, context features)
Content Service -- manages the video catalog and metadata (title, genre, cast, release date)
API Gateway -- orchestrates calls to candidate generation, feature retrieval, and ranking, then returns the final ranked list
A/B Testing Framework -- assigns users to experiment groups and logs impressions and outcomes
Discuss how services communicate (REST, gRPC) and where caching is applied to reduce latency
Interviewers want to see how you narrow down millions of videos to a manageable candidate set for ranking.
Collaborative filtering: users who watched video A also watched videos B, C, D (item-to-item similarity)
Content-based filtering: videos similar to those the user has watched (genre, cast, tags)
Two-tower models (user embedding and item embedding) for efficient retrieval
Approximate nearest neighbor search (FAISS, Annoy) for fast similarity lookups
Combine multiple candidate sources (collaborative, content-based, trending, new releases) and merge them
Interviewers probe on how you score and rank the candidate set to predict user engagement.
Gradient-boosted trees (XGBoost, LightGBM) or deep learning models (wide and deep, two-tower)
Features: user watch history, video metadata, time of day, device type, user demographics
Training objective: predict watch probability, completion rate, or rating
Offline evaluation metrics: precision at k, recall at k, NDCG
Online evaluation: A/B test metrics (click-through rate, watch time, retention)
Interviewers want to see how you compute and serve features with low latency.
Interviewers push on how you handle peak load and minimize latency.
Horizontal scaling of ranking service with load balancing
Caching of candidate generation results for popular user segments
Pre-warming cache with recommendations for active users before they visit the home page
Asynchronous batch updates of precomputed recommendations (computed overnight, served during the day)
Graceful degradation: fallback to popular or trending content if personalized service is slow
Ask about the scale (users, videos), latency budget, whether the interviewer wants to focus on the ML models or the system architecture, and whether you should emphasize candidate generation or ranking.
API Gateway -- receives home page requests and orchestrates the recommendation pipeline
Candidate Generation Service -- retrieves a diverse set of candidates using collaborative filtering, content-based methods, or retrieval models
Ranking Service -- scores and ranks candidates using an ML model
Feature Store -- provides real-time user and video features for ranking
Content Metadata Service -- stores video metadata (title, genre, cast)
A/B Testing Service -- assigns users to experiments and logs outcomes
Offline Training Pipeline -- trains candidate retrieval and ranking models using historical user interaction data
Monitoring and Analytics -- tracks model performance, latency, and business metrics
Discuss how you retrieve candidates from a large catalog, the trade-off between recall and latency, and how you combine multiple retrieval strategies.
Explain your model choice, feature set, training objective, and offline/online evaluation metrics. Walk through an example prediction.
Discuss how you scale each service, optimize for low latency, and handle peak load. Explain caching strategies and graceful degradation.
"Home page video recommendation. Spent more time discussing how many services are needed and how they interact. Didn't go deep into feature engineering, modeling, or evaluation — the interviewer cared more about the service-level architecture."