Design a News Feed Ranking System — Atlassian

Problem Statement

Design a machine learning system that ranks news feed items based on relevance, user preferences, engagement history, and content freshness. The system must serve personalized rankings to millions of users in real-time, learn continuously from user interactions, and balance multiple objectives including engagement, content diversity, and business goals.

At Atlassian, this maps to personalizing the Confluence home feed or Jira dashboard -- surfacing the most relevant pages, tickets, or updates for each user based on their role, team activity, and past behavior. Interviewers use this question to test your understanding of the full ML system lifecycle: feature engineering, model architecture, training and serving infrastructure, feedback loops, and the operational challenges of deploying ML in production. Strong answers demonstrate a clear two-stage funnel (candidate generation followed by ranking), practical feature design, and awareness of cold start, position bias, and model freshness tradeoffs.

Key Requirements

Functional

Personalized ranking -- generate a ranked feed of content tailored to each user's interests, role, and recent behavior
Real-time signal incorporation -- factor in fresh interactions (likes, clicks, comments) to adjust rankings within minutes
Multi-objective optimization -- balance engagement probability, content diversity, recency, and business objectives like highlighting important announcements
Cold start handling -- provide reasonable rankings for new users with no interaction history and for newly created content with no engagement data

Non-Functional

Scalability -- support hundreds of millions of daily active users with 10B+ content items in the catalog
Latency -- p99 feed generation under 500ms including model inference under 100ms
Reliability -- graceful degradation to a simpler ranking (recency-based) when ML models are unavailable
Freshness -- model predictions should reflect behavioral shifts within hours; feature store updates within minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Two-Stage Funnel Architecture

Scoring billions of items per request is infeasible. Interviewers want to see a retrieval stage that narrows candidates before an expensive ranking model scores them.

Hints to consider:

Stage 1 (Candidate Generation): use multiple retrievers in parallel -- collaborative filtering (users similar to you engaged with these items), content-based filtering (items similar to what you interacted with), social graph (teammates' recent activity), and recency-based (new content from followed sources)
Use approximate nearest neighbor (ANN) search with user and content embeddings stored in a vector index like FAISS or ScaNN to retrieve 500-2,000 candidates per retriever
Merge and deduplicate candidates from all retrievers before passing to the ranking stage
Ensure diversity in the candidate set by sampling from multiple retrieval strategies rather than relying on a single source

2. Feature Engineering and Model Design

The quality of the ranking depends heavily on the features you extract and the model architecture you choose.

Hints to consider:

User features: historical click-through rate, engagement patterns by content type, recency of last login, team membership, and role
Content features: age, creator reputation, content type (page, ticket, comment), topic embeddings, and engagement velocity (clicks per hour since creation)
Context features: time of day, day of week, device type, and session depth (how many items the user has already seen)
Use a two-tower neural network for candidate generation (separate user and item towers producing embeddings) and a cross-network or gradient boosted tree (XGBoost, LightGBM) for the final ranking stage where feature interactions matter most

3. Training and Serving Pipeline

Interviewers probe how you train models, deploy them, and keep them fresh without introducing latency or instability.

Hints to consider:

Train on logged interaction data (impressions, clicks, dwell time) using a pointwise approach predicting engagement probability
Use a feature store (Feast, Tecton) that serves both batch features (updated hourly) and real-time features (updated per event via streaming)
Serve models via a dedicated inference service (TensorFlow Serving, Triton) with model versioning and canary deployments
Retrain daily on new interaction data; use online learning or incremental fine-tuning for faster adaptation to trending content

4. Handling Cold Start and Position Bias

New users and new content lack the signals that drive personalization. Position bias means users engage more with top-ranked items regardless of actual relevance.

Hints to consider:

For new users, bootstrap with demographic features, initial interest signals (onboarding survey), or popularity-based ranking until sufficient history accumulates
For new content, use creator history and content-based features (title, body embeddings) to predict engagement without relying on interaction data
Correct for position bias by logging the position each item was shown at and using inverse propensity weighting or position-aware loss functions during training
Implement exploration strategies (epsilon-greedy, Thompson sampling) to expose users to diverse content and gather training data for under-explored items

Suggested Approach

Step 1: Clarify Requirements

Confirm the type of content being ranked (articles, tickets, comments), the primary optimization objective (engagement, relevance, or a blend), the expected user base size, and latency requirements. Ask about the data available for training -- are impression logs and click logs already collected? Clarify whether the ranking must also handle ads or promoted content.

Step 2: High-Level Architecture

Sketch the major components: a Content Ingestion Service that processes new items and computes content features; a Feature Store serving pre-computed user and content features with both batch and real-time update paths; a Candidate Generation Layer using multiple ANN-based retrievers; a Ranking Service that scores candidates with an ML model; a Feed Assembly Service that applies business rules, diversity constraints, and pagination; and a Feedback Loop that captures user interactions and feeds them back into training data and real-time features.

Step 3: Deep Dive on the Ranking Pipeline

Walk through a feed request end to end. The user opens the app, the feed service calls candidate generation (ANN search across multiple indexes), retrieves 1,000-2,000 candidates, fetches features from the feature store, sends the feature matrix to the ranking model for scoring, applies post-processing (diversity injection, deduplication, business rule boosts), and returns the top N items. Discuss how features are computed: batch features like "user's 30-day click-through rate by category" are precomputed hourly, while real-time features like "items clicked in this session" are computed on the fly from a streaming pipeline.

Step 4: Address Secondary Concerns

Cover model evaluation: use offline metrics (AUC, NDCG) for development and online A/B testing for production decisions, tracking engagement rate, dwell time, and daily active user retention. Discuss fallback strategies: if the ML service is down, serve a recency-based feed from a cached ranking. Address monitoring: track model latency, prediction distribution drift, feature freshness, and click-through rate trends. Explain how to prevent filter bubbles by injecting exploration items and monitoring content diversity in served feeds.

Real Interview Quotes

"Design a personalized ranking system for recommending content to users."

Problem Statement

Key Requirements

Functional

Personalized ranking -- generate a ranked feed of content tailored to each user's interests, role, and recent behavior
Real-time signal incorporation -- factor in fresh interactions (likes, clicks, comments) to adjust rankings within minutes
Multi-objective optimization -- balance engagement probability, content diversity, recency, and business objectives like highlighting important announcements
Cold start handling -- provide reasonable rankings for new users with no interaction history and for newly created content with no engagement data

Non-Functional

Scalability -- support hundreds of millions of daily active users with 10B+ content items in the catalog
Latency -- p99 feed generation under 500ms including model inference under 100ms
Reliability -- graceful degradation to a simpler ranking (recency-based) when ML models are unavailable
Freshness -- model predictions should reflect behavioral shifts within hours; feature store updates within minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Two-Stage Funnel Architecture

Scoring billions of items per request is infeasible. Interviewers want to see a retrieval stage that narrows candidates before an expensive ranking model scores them.

Hints to consider:

Stage 1 (Candidate Generation): use multiple retrievers in parallel -- collaborative filtering (users similar to you engaged with these items), content-based filtering (items similar to what you interacted with), social graph (teammates' recent activity), and recency-based (new content from followed sources)
Use approximate nearest neighbor (ANN) search with user and content embeddings stored in a vector index like FAISS or ScaNN to retrieve 500-2,000 candidates per retriever
Merge and deduplicate candidates from all retrievers before passing to the ranking stage
Ensure diversity in the candidate set by sampling from multiple retrieval strategies rather than relying on a single source

2. Feature Engineering and Model Design

The quality of the ranking depends heavily on the features you extract and the model architecture you choose.

Hints to consider:

User features: historical click-through rate, engagement patterns by content type, recency of last login, team membership, and role
Content features: age, creator reputation, content type (page, ticket, comment), topic embeddings, and engagement velocity (clicks per hour since creation)
Context features: time of day, day of week, device type, and session depth (how many items the user has already seen)
Use a two-tower neural network for candidate generation (separate user and item towers producing embeddings) and a cross-network or gradient boosted tree (XGBoost, LightGBM) for the final ranking stage where feature interactions matter most

3. Training and Serving Pipeline

Interviewers probe how you train models, deploy them, and keep them fresh without introducing latency or instability.

Hints to consider:

Train on logged interaction data (impressions, clicks, dwell time) using a pointwise approach predicting engagement probability
Use a feature store (Feast, Tecton) that serves both batch features (updated hourly) and real-time features (updated per event via streaming)
Serve models via a dedicated inference service (TensorFlow Serving, Triton) with model versioning and canary deployments
Retrain daily on new interaction data; use online learning or incremental fine-tuning for faster adaptation to trending content

4. Handling Cold Start and Position Bias

New users and new content lack the signals that drive personalization. Position bias means users engage more with top-ranked items regardless of actual relevance.

Hints to consider:

For new users, bootstrap with demographic features, initial interest signals (onboarding survey), or popularity-based ranking until sufficient history accumulates
For new content, use creator history and content-based features (title, body embeddings) to predict engagement without relying on interaction data
Correct for position bias by logging the position each item was shown at and using inverse propensity weighting or position-aware loss functions during training
Implement exploration strategies (epsilon-greedy, Thompson sampling) to expose users to diverse content and gather training data for under-explored items