Practice/LinkedIn/Design Recommender System

Design Recommender System

System DesignMust

Problem Statement

Design a recommendation system that analyzes user preferences and behavior to suggest relevant content, products, or services with personalized rankings and filtering capabilities. Users browse personalized lists that update as they interact, and they can filter or refine results by categories, topics, or contexts.

The system must combine data pipelines, machine learning inference, and low-latency serving into a coherent architecture. It should handle event ingestion at scale, deliver fast personalized results, and evolve through A/B testing and feedback loops. Consider how to balance offline training with online inference, handle cold start problems for new users and items, and maintain diversity in recommendations to avoid filter bubbles.

Key Requirements

Functional

Personalized rankings -- users see a ranked list of items relevant to their interests and context, updated as their behavior changes
Filtering and refinement -- users apply filters (category, price range, genre) and see updated personalized results
Feedback integration -- users provide explicit feedback (like, dislike, hide, save) that influences future recommendations
Near-real-time refresh -- recommendations refresh as recent activity changes without requiring manual page reload

Non-Functional

Scalability -- serve recommendations to 100M+ daily active users with sub-200ms p99 latency at the serving tier
Reliability -- gracefully degrade to popularity-based fallbacks when ML models or feature stores are unavailable
Latency -- candidate retrieval and ranking combined must complete within 200ms for a responsive user experience
Freshness -- incorporate user actions from the last few minutes into recommendations via near-real-time feature updates

Interview Reports from Hello Interview

6 reports from candidates. Most recently asked at LinkedIn in Early December 2025.

Also commonly asked at: Meta, Disney, SoFi.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Multi-Stage Retrieval and Ranking Pipeline

Interviewers expect a clear separation between candidate generation (retrieving a broad set of relevant items) and ranking (scoring and ordering the final list). A single monolithic model approach will not scale.

Hints to consider:

Use a two-stage pipeline: fast candidate retrieval (ANN search, collaborative filtering, content-based filters) followed by a precise ranking model
Add a re-ranking stage for business rules, diversity constraints, and freshness boosting after ML ranking
Consider multiple retrieval sources (user history, trending, similar users) merged before ranking
Design for graceful degradation: if ranking fails, fall back to pre-computed top-N lists

2. Feature Store and Real-Time Feature Computation

Recommendation quality depends on fresh, accurate features. Interviewers probe how you manage both batch-computed features and real-time signals.

Hints to consider:

Use a dual-layer feature store: batch features (user embeddings, item popularity) updated hourly and real-time features (recent clicks, session context) updated in seconds
Stream user events through Kafka/Flink to compute real-time aggregations (rolling CTR, recency-weighted engagement)
Cache frequently accessed features in Redis for sub-millisecond serving latency
Handle feature skew and missing values gracefully to avoid model degradation

3. Cold Start and Exploration

Over-optimizing for short-term engagement can entrench popular items and hurt long-term diversity. Interviewers expect strategies for new users and new items.

Hints to consider:

For new users, leverage contextual signals (device, location, time) and popular/trending items as initial recommendations
For new items, use content-based features and controlled exploration (epsilon-greedy or Thompson sampling) to gather engagement data
Implement diversity constraints that ensure recommendations span multiple categories or content types
Balance exploitation (showing high-confidence items) with exploration (testing uncertain items) using multi-armed bandit approaches

4. Offline Training and Online Serving Integration

Interviewers want to see how you connect batch model training with real-time serving without creating operational fragility.

Hints to consider:

Train models offline on historical interaction data using batch pipelines (Spark, distributed training)
Deploy models to a serving infrastructure that loads updated weights without downtime (blue-green model deployment)
Use A/B testing to compare model versions and measure impact on engagement metrics
Monitor model performance in production and detect distribution drift that might degrade recommendations

Practice/LinkedIn/Design Recommender System

Design Recommender System

System DesignMust

Problem Statement

Key Requirements

Functional

Personalized rankings -- users see a ranked list of items relevant to their interests and context, updated as their behavior changes
Filtering and refinement -- users apply filters (category, price range, genre) and see updated personalized results
Feedback integration -- users provide explicit feedback (like, dislike, hide, save) that influences future recommendations
Near-real-time refresh -- recommendations refresh as recent activity changes without requiring manual page reload

Non-Functional

Scalability -- serve recommendations to 100M+ daily active users with sub-200ms p99 latency at the serving tier
Reliability -- gracefully degrade to popularity-based fallbacks when ML models or feature stores are unavailable
Latency -- candidate retrieval and ranking combined must complete within 200ms for a responsive user experience
Freshness -- incorporate user actions from the last few minutes into recommendations via near-real-time feature updates

Interview Reports from Hello Interview

6 reports from candidates. Most recently asked at LinkedIn in Early December 2025.

Also commonly asked at: Meta, Disney, SoFi.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Multi-Stage Retrieval and Ranking Pipeline

Hints to consider:

Use a two-stage pipeline: fast candidate retrieval (ANN search, collaborative filtering, content-based filters) followed by a precise ranking model
Add a re-ranking stage for business rules, diversity constraints, and freshness boosting after ML ranking
Consider multiple retrieval sources (user history, trending, similar users) merged before ranking
Design for graceful degradation: if ranking fails, fall back to pre-computed top-N lists

2. Feature Store and Real-Time Feature Computation

Recommendation quality depends on fresh, accurate features. Interviewers probe how you manage both batch-computed features and real-time signals.

Hints to consider:

Use a dual-layer feature store: batch features (user embeddings, item popularity) updated hourly and real-time features (recent clicks, session context) updated in seconds
Stream user events through Kafka/Flink to compute real-time aggregations (rolling CTR, recency-weighted engagement)
Cache frequently accessed features in Redis for sub-millisecond serving latency
Handle feature skew and missing values gracefully to avoid model degradation

3. Cold Start and Exploration

Over-optimizing for short-term engagement can entrench popular items and hurt long-term diversity. Interviewers expect strategies for new users and new items.

Hints to consider:

For new users, leverage contextual signals (device, location, time) and popular/trending items as initial recommendations
For new items, use content-based features and controlled exploration (epsilon-greedy or Thompson sampling) to gather engagement data
Implement diversity constraints that ensure recommendations span multiple categories or content types
Balance exploitation (showing high-confidence items) with exploration (testing uncertain items) using multi-armed bandit approaches

4. Offline Training and Online Serving Integration

Interviewers want to see how you connect batch model training with real-time serving without creating operational fragility.

Hints to consider:

Train models offline on historical interaction data using batch pipelines (Spark, distributed training)
Deploy models to a serving infrastructure that loads updated weights without downtime (blue-green model deployment)
Use A/B testing to compare model versions and measure impact on engagement metrics
Monitor model performance in production and detect distribution drift that might degrade recommendations