Practice/LinkedIn/Design Recommender System
Design Recommender System
System DesignMust
Problem Statement
Design a recommendation system that analyzes user preferences and behavior to suggest relevant content, products, or services with personalized rankings and filtering capabilities. Users browse personalized lists that update as they interact, and they can filter or refine results by categories, topics, or contexts.
The system must combine data pipelines, machine learning inference, and low-latency serving into a coherent architecture. It should handle event ingestion at scale, deliver fast personalized results, and evolve through A/B testing and feedback loops. Consider how to balance offline training with online inference, handle cold start problems for new users and items, and maintain diversity in recommendations to avoid filter bubbles.
Key Requirements
Functional
- Personalized rankings -- users see a ranked list of items relevant to their interests and context, updated as their behavior changes
- Filtering and refinement -- users apply filters (category, price range, genre) and see updated personalized results
- Feedback integration -- users provide explicit feedback (like, dislike, hide, save) that influences future recommendations
- Near-real-time refresh -- recommendations refresh as recent activity changes without requiring manual page reload
Non-Functional
- Scalability -- serve recommendations to 100M+ daily active users with sub-200ms p99 latency at the serving tier
- Reliability -- gracefully degrade to popularity-based fallbacks when ML models or feature stores are unavailable
- Latency -- candidate retrieval and ranking combined must complete within 200ms for a responsive user experience
- Freshness -- incorporate user actions from the last few minutes into recommendations via near-real-time feature updates
Interview Reports from Hello Interview
6 reports from candidates. Most recently asked at LinkedIn in Early December 2025.
Also commonly asked at: Meta, Disney, SoFi.
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Multi-Stage Retrieval and Ranking Pipeline
Interviewers expect a clear separation between candidate generation (retrieving a broad set of relevant items) and ranking (scoring and ordering the final list). A single monolithic model approach will not scale.
Hints to consider:
- Use a two-stage pipeline: fast candidate retrieval (ANN search, collaborative filtering, content-based filters) followed by a precise ranking model
- Add a re-ranking stage for business rules, diversity constraints, and freshness boosting after ML ranking
- Consider multiple retrieval sources (user history, trending, similar users) merged before ranking
- Design for graceful degradation: if ranking fails, fall back to pre-computed top-N lists
2. Feature Store and Real-Time Feature Computation
Recommendation quality depends on fresh, accurate features. Interviewers probe how you manage both batch-computed features and real-time signals.
Hints to consider:
- Use a dual-layer feature store: batch features (user embeddings, item popularity) updated hourly and real-time features (recent clicks, session context) updated in seconds
- Stream user events through Kafka/Flink to compute real-time aggregations (rolling CTR, recency-weighted engagement)
- Cache frequently accessed features in Redis for sub-millisecond serving latency
- Handle feature skew and missing values gracefully to avoid model degradation
3. Cold Start and Exploration
Over-optimizing for short-term engagement can entrench popular items and hurt long-term diversity. Interviewers expect strategies for new users and new items.
Hints to consider:
- For new users, leverage contextual signals (device, location, time) and popular/trending items as initial recommendations
- For new items, use content-based features and controlled exploration (epsilon-greedy or Thompson sampling) to gather engagement data
- Implement diversity constraints that ensure recommendations span multiple categories or content types
- Balance exploitation (showing high-confidence items) with exploration (testing uncertain items) using multi-armed bandit approaches
4. Offline Training and Online Serving Integration
Interviewers want to see how you connect batch model training with real-time serving without creating operational fragility.
Hints to consider:
- Train models offline on historical interaction data using batch pipelines (Spark, distributed training)
- Deploy models to a serving infrastructure that loads updated weights without downtime (blue-green model deployment)
- Use A/B testing to compare model versions and measure impact on engagement metrics
- Monitor model performance in production and detect distribution drift that might degrade recommendations