For a walkthrough of building an AI-powered answer system with retrieval pipelines, see our Design ChatGPT guide. While the ChatGPT guide focuses on conversational AI, many of the same patterns around content ingestion, ranking, and low-latency serving apply to a recommendation feed.
Also review the Databases, Caching, and Search building blocks.
Design a recommender system for Perplexity's Discover feed. The system collects articles from thousands of publishers and serves personalized feeds to millions of users. Logged-in users see content tailored to their interests, subscribed topics, and reading history, while guest users see a trending or global feed based on popularity and recency.
This problem combines content aggregation, personalization, and ranking at scale. You should be prepared to discuss recommendation approaches -- content-based filtering, collaborative filtering, and learned ranking models -- as well as the infrastructure required for real-time ingestion from diverse sources, efficient indexing, and low-latency feed generation. Interviewers care as much about how the pieces fit together architecturally as they do about the recommendation algorithms themselves.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers expect a complete system diagram showing how articles flow from publishers through ingestion, processing, indexing, and ranking to the user-facing feed. They want to see storage systems, caching layers, and how real-time and batch components interact.
Hints to consider:
Interviewers probe how you handle diverse content sources and prevent showing the same story from multiple publishers.
Hints to consider:
Interviewers want to see that you understand how to build a recommendation system that balances relevance, freshness, and diversity.
Hints to consider:
Without personalization data, the guest feed must still be compelling. Interviewers push on how you generate it.
Hints to consider:
Interviewers probe how you handle growing content volume and user traffic without degrading latency or freshness.
Hints to consider:
Ask about the scale (number of publishers, articles per day, active users), latency budget for feed generation, acceptable staleness for personalization updates, and whether the interviewer wants a deep dive on recommendation algorithms or infrastructure.
Sketch these core components: a Content Ingestion Service that polls RSS feeds, APIs, and scraping services and writes raw articles to a queue; a Content Processor that deduplicates, extracts entities, classifies topics, and generates embeddings; an Article Index (Elasticsearch) storing processed articles with metadata and embeddings; a User Profile Service maintaining interest vectors, reading history, and subscribed topics; a Ranking Service that generates personalized feeds by querying the article index and scoring against user profiles; a Trending Service that computes and caches a global feed based on popularity and recency; a Feed API that routes requests to ranking (logged-in) or trending (guest); an Engagement Tracker that logs interactions and asynchronously updates user profiles; and an Offline Analytics Pipeline for training embeddings and evaluating ranking quality.
Discuss how you handle diverse publisher formats, deduplication strategies, and quality filtering. Walk through the pipeline from raw HTML or RSS to structured article records in the index. Explain how the content processor generates embeddings for each article using a pre-trained language model, extracts named entities for topic tagging, and writes the enriched record to both the article index and a Kafka topic for downstream consumers.
This is where Perplexity interviews focus. Explain how user profiles are built incrementally from engagement events. Describe the ranking formula that combines content similarity (cosine similarity between article and user interest vectors), collaborative signal (what similar users engaged with), recency score (exponential decay by article age), and diversity penalty (reduce score for articles too similar to recently shown items). Discuss the two-phase retrieval approach: a fast candidate generation step that retrieves hundreds of articles from the index using topic filters and embedding similarity, followed by a scoring and re-ranking step that applies the full ranking model. Cover caching: store pre-computed personalized feeds with a short TTL and invalidate on new high-relevance articles.
Discuss caching strategies at multiple layers: the guest trending feed is pre-computed every few minutes and served from CDN; personalized feeds are cached per user with a TTL of one to five minutes; user interest vectors are materialized in a fast key-value store; and article embeddings are pre-computed during ingestion. Explain how you balance freshness (shorter TTL, more recomputation) with latency (longer TTL, stale but fast).
Candidates report that Perplexity interviewers want to see a full architecture diagram with all components -- ingestion, processing, storage, ranking, and serving -- and care deeply about how the pieces fit together and where bottlenecks might appear. Deduplication was a major topic: how do you detect when multiple outlets cover the same story? Personalization was the core discussion: how do you build user profiles from sparse data, especially for new users? Scalability was probed hard: what happens when you grow from 1 million to 100 million users? The trending feed for guests was also interesting: how do you make it compelling without any personalization data?