Design Perplexity Discover — Perplexity

Reference Answer

For a walkthrough of building an AI-powered answer system with retrieval pipelines, see our Design ChatGPT guide. While the ChatGPT guide focuses on conversational AI, many of the same patterns around content ingestion, ranking, and low-latency serving apply to a recommendation feed.

Also review the Databases, Caching, and Search building blocks.

Problem Statement

Design a recommender system for Perplexity's Discover feed. The system collects articles from thousands of publishers and serves personalized feeds to millions of users. Logged-in users see content tailored to their interests, subscribed topics, and reading history, while guest users see a trending or global feed based on popularity and recency.

This problem combines content aggregation, personalization, and ranking at scale. You should be prepared to discuss recommendation approaches -- content-based filtering, collaborative filtering, and learned ranking models -- as well as the infrastructure required for real-time ingestion from diverse sources, efficient indexing, and low-latency feed generation. Interviewers care as much about how the pieces fit together architecturally as they do about the recommendation algorithms themselves.

Key Requirements

Functional

Content ingestion pipeline -- Collect articles from thousands of publishers via RSS feeds, APIs, and web scraping, with deduplication and quality filtering
Personalized feed generation -- Rank articles based on user interests, reading history, subscribed topics, and engagement signals
Trending feed for guests -- Serve a non-personalized feed to anonymous users based on recency, popularity, and editorial curation
Topic modeling and tagging -- Automatically classify articles into topics and extract entities for filtering and recommendation
User interaction tracking -- Capture clicks, reads, time-on-page, and shares to refine future recommendations

Non-Functional

Low latency -- Feed generation completes within hundreds of milliseconds for a responsive user experience
High throughput -- Handle millions of feed requests per day from a growing user base
Freshness -- New articles appear in feeds within minutes of publication
Scalability -- All components (ingestion, ranking, serving) scale horizontally with traffic and content volume
Content diversity -- Avoid filter bubbles by balancing relevance with serendipity and topic variety

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. End-to-End System Architecture

Interviewers expect a complete system diagram showing how articles flow from publishers through ingestion, processing, indexing, and ranking to the user-facing feed. They want to see storage systems, caching layers, and how real-time and batch components interact.

Hints to consider:

Sketch a content ingestion pipeline that polls RSS feeds, scraping services, and publisher APIs on a schedule
Include a content processing layer for deduplication, entity extraction, and topic classification
Show a search index (Elasticsearch or similar) for fast article retrieval by keywords and topics
Include a user profile store that aggregates reading history, interests, and engagement signals
Separate pipelines for personalized (logged-in) versus trending (guest) feeds
Position caching layers to avoid recomputing feeds on every request

2. Content Ingestion and Deduplication

Interviewers probe how you handle diverse content sources and prevent showing the same story from multiple publishers.

Hints to consider:

Scheduled crawlers poll RSS feeds and APIs at regular intervals; web scraping with headless browsers handles publishers without structured feeds
Content fingerprinting using MinHash or simhash detects near-duplicates even when publishers rephrase the same story
Quality filtering removes spam, low-quality content, and irrelevant articles before they enter the index
Incremental processing handles high-volume publishers without overwhelming downstream systems

3. Personalization and Ranking

Interviewers want to see that you understand how to build a recommendation system that balances relevance, freshness, and diversity.

Hints to consider:

Build user profiles from reading history, explicit topic subscriptions, and implicit signals (clicks, time-on-page, shares)
Use content-based filtering by matching article embeddings (for example, from a sentence transformer model) against user interest vectors
Add collaborative filtering to surface articles read by users with similar profiles
Apply recency boosting to prioritize fresh content and diversity injection to prevent echo chambers
Design an A/B testing framework to evaluate ranking algorithm changes on real traffic

4. Trending Feed for Guest Users

Without personalization data, the guest feed must still be compelling. Interviewers push on how you generate it.

Hints to consider:

Combine popularity signals (click-through rate, read time, share counts) with engagement velocity (how fast an article gains traction)
Apply time-decay functions to downrank older articles even if they were popular
Use editorial curation or human-curated topic channels for cold-start scenarios and breaking news
Pre-compute and heavily cache the trending feed since all guest users see the same result
Segment by geography or language if the user base is global

5. Scalability and Performance Optimization

Interviewers probe how you handle growing content volume and user traffic without degrading latency or freshness.

Hints to consider:

Scale ingestion workers horizontally, each responsible for a subset of publishers
Shard the user profile store by user ID and the article index by article ID
Use multi-tiered caching: CDN for the guest trending feed, application cache for personalized feeds with short TTL
Pre-compute user interest vectors and article embeddings in batch during off-peak hours
Separate read and write paths so ingestion load does not affect serving latency

Suggested Approach

Step 1: Clarify Requirements

Ask about the scale (number of publishers, articles per day, active users), latency budget for feed generation, acceptable staleness for personalization updates, and whether the interviewer wants a deep dive on recommendation algorithms or infrastructure.

Step 2: High-Level Architecture

Sketch these core components: a Content Ingestion Service that polls RSS feeds, APIs, and scraping services and writes raw articles to a queue; a Content Processor that deduplicates, extracts entities, classifies topics, and generates embeddings; an Article Index (Elasticsearch) storing processed articles with metadata and embeddings; a User Profile Service maintaining interest vectors, reading history, and subscribed topics; a Ranking Service that generates personalized feeds by querying the article index and scoring against user profiles; a Trending Service that computes and caches a global feed based on popularity and recency; a Feed API that routes requests to ranking (logged-in) or trending (guest); an Engagement Tracker that logs interactions and asynchronously updates user profiles; and an Offline Analytics Pipeline for training embeddings and evaluating ranking quality.

Step 3: Deep Dive on Content Ingestion

Discuss how you handle diverse publisher formats, deduplication strategies, and quality filtering. Walk through the pipeline from raw HTML or RSS to structured article records in the index. Explain how the content processor generates embeddings for each article using a pre-trained language model, extracts named entities for topic tagging, and writes the enriched record to both the article index and a Kafka topic for downstream consumers.

Step 4: Deep Dive on Personalization

This is where Perplexity interviews focus. Explain how user profiles are built incrementally from engagement events. Describe the ranking formula that combines content similarity (cosine similarity between article and user interest vectors), collaborative signal (what similar users engaged with), recency score (exponential decay by article age), and diversity penalty (reduce score for articles too similar to recently shown items). Discuss the two-phase retrieval approach: a fast candidate generation step that retrieves hundreds of articles from the index using topic filters and embedding similarity, followed by a scoring and re-ranking step that applies the full ranking model. Cover caching: store pre-computed personalized feeds with a short TTL and invalidate on new high-relevance articles.

Step 5: Performance Optimization and Caching

Discuss caching strategies at multiple layers: the guest trending feed is pre-computed every few minutes and served from CDN; personalized feeds are cached per user with a TTL of one to five minutes; user interest vectors are materialized in a fast key-value store; and article embeddings are pre-computed during ingestion. Explain how you balance freshness (shorter TTL, more recomputation) with latency (longer TTL, stale but fast).

Real Interview Insights

Candidates report that Perplexity interviewers want to see a full architecture diagram with all components -- ingestion, processing, storage, ranking, and serving -- and care deeply about how the pieces fit together and where bottlenecks might appear. Deduplication was a major topic: how do you detect when multiple outlets cover the same story? Personalization was the core discussion: how do you build user profiles from sparse data, especially for new users? Scalability was probed hard: what happens when you grow from 1 million to 100 million users? The trending feed for guests was also interesting: how do you make it compelling without any personalization data?

Reference Answer

Also review the Databases, Caching, and Search building blocks.

Problem Statement

Key Requirements

Functional

Content ingestion pipeline -- Collect articles from thousands of publishers via RSS feeds, APIs, and web scraping, with deduplication and quality filtering
Personalized feed generation -- Rank articles based on user interests, reading history, subscribed topics, and engagement signals
Trending feed for guests -- Serve a non-personalized feed to anonymous users based on recency, popularity, and editorial curation
Topic modeling and tagging -- Automatically classify articles into topics and extract entities for filtering and recommendation
User interaction tracking -- Capture clicks, reads, time-on-page, and shares to refine future recommendations

Non-Functional

Low latency -- Feed generation completes within hundreds of milliseconds for a responsive user experience
High throughput -- Handle millions of feed requests per day from a growing user base
Freshness -- New articles appear in feeds within minutes of publication
Scalability -- All components (ingestion, ranking, serving) scale horizontally with traffic and content volume
Content diversity -- Avoid filter bubbles by balancing relevance with serendipity and topic variety

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. End-to-End System Architecture

Hints to consider:

Sketch a content ingestion pipeline that polls RSS feeds, scraping services, and publisher APIs on a schedule
Include a content processing layer for deduplication, entity extraction, and topic classification
Show a search index (Elasticsearch or similar) for fast article retrieval by keywords and topics
Include a user profile store that aggregates reading history, interests, and engagement signals
Separate pipelines for personalized (logged-in) versus trending (guest) feeds
Position caching layers to avoid recomputing feeds on every request

2. Content Ingestion and Deduplication

Interviewers probe how you handle diverse content sources and prevent showing the same story from multiple publishers.

Hints to consider:

Scheduled crawlers poll RSS feeds and APIs at regular intervals; web scraping with headless browsers handles publishers without structured feeds
Content fingerprinting using MinHash or simhash detects near-duplicates even when publishers rephrase the same story
Quality filtering removes spam, low-quality content, and irrelevant articles before they enter the index
Incremental processing handles high-volume publishers without overwhelming downstream systems

3. Personalization and Ranking

Interviewers want to see that you understand how to build a recommendation system that balances relevance, freshness, and diversity.

Hints to consider:

Build user profiles from reading history, explicit topic subscriptions, and implicit signals (clicks, time-on-page, shares)
Use content-based filtering by matching article embeddings (for example, from a sentence transformer model) against user interest vectors
Add collaborative filtering to surface articles read by users with similar profiles
Apply recency boosting to prioritize fresh content and diversity injection to prevent echo chambers
Design an A/B testing framework to evaluate ranking algorithm changes on real traffic

4. Trending Feed for Guest Users

Without personalization data, the guest feed must still be compelling. Interviewers push on how you generate it.

Hints to consider:

Combine popularity signals (click-through rate, read time, share counts) with engagement velocity (how fast an article gains traction)
Apply time-decay functions to downrank older articles even if they were popular
Use editorial curation or human-curated topic channels for cold-start scenarios and breaking news
Pre-compute and heavily cache the trending feed since all guest users see the same result
Segment by geography or language if the user base is global

5. Scalability and Performance Optimization

Interviewers probe how you handle growing content volume and user traffic without degrading latency or freshness.

Hints to consider:

Scale ingestion workers horizontally, each responsible for a subset of publishers
Shard the user profile store by user ID and the article index by article ID
Use multi-tiered caching: CDN for the guest trending feed, application cache for personalized feeds with short TTL
Pre-compute user interest vectors and article embeddings in batch during off-peak hours
Separate read and write paths so ingestion load does not affect serving latency