Problem Statement
Design a social media news feed system where users publish posts with text and images, follow other accounts, and consume a personalized, ranked home feed that blends content from followed users, sponsored advertisements, and news articles. The feed must feel alive -- new posts appear quickly, engagement counts update in near real time, and scrolling reveals an endless stream of relevant content.
The core engineering challenges are handling the read-heavy, high-fanout nature of feed assembly (each post may reach millions of followers), ranking and interleaving multiple content types under strict latency budgets, dealing with celebrity accounts that create write amplification during fanout, and ensuring the system scales to billions of daily feed requests. Strong answers reason about hybrid fanout strategies, caching at multiple tiers, real-time invalidation, and the separation of social content from ad and news blending logic.
Key Requirements
Functional
- Post creation -- users can publish posts with text and images that are distributed to their followers' feeds
- Personalized ranked feed -- users see a home feed that blends followed-user posts, sponsored ads, and news articles, ranked by relevance rather than pure chronology
- Engagement -- users can like, comment on, and share feed items with engagement counts updating promptly
- Continuous scrolling -- users can scroll through an endless feed with low latency, including near-real-time insertion of new content when available
Non-Functional
- Scalability -- support 2 billion daily active users with 500,000+ feed reads per second at peak and 100,000+ posts per second
- Reliability -- 99.99% availability for feed reads; no lost posts
- Latency -- home feed loads within 200ms at p95; new posts from followed users appear within 5 seconds
- Consistency -- eventual consistency for feed materialization and engagement counters; strong consistency for post creation and follow graph mutations
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Hybrid Fanout Strategy
The most critical design decision is how posts reach followers' feeds. Interviewers expect you to explain why a single approach fails at scale and propose a hybrid solution.
Hints to consider:
- Fanout-on-write pushes each new post into all followers' precomputed timeline caches at write time, enabling fast reads but causing massive write amplification for accounts with millions of followers
- Fanout-on-read assembles the timeline at request time by querying each followed account's recent posts, avoiding write amplification but making reads expensive for users who follow many accounts
- A hybrid approach uses fanout-on-write for normal users (fast reads, bounded writes) and fanout-on-read for celebrity accounts (avoids write explosion), merging results at read time
- Detect high-fanout accounts dynamically based on follower count thresholds and route their posts through the deferred path
2. Feed Ranking and Content Blending
A chronological feed misses the core product goal of surfacing the most relevant content. Interviewers want to see how you rank and interleave organic posts, ads, and news.
Hints to consider:
- Use a multi-stage pipeline: candidate retrieval (fetch recent posts from precomputed cache + celebrity outboxes), scoring (apply a ranking model using engagement signals, recency, author affinity), and re-ranking (insert ads at paced intervals, blend news articles, enforce diversity rules)
- Precompute ranking features (author engagement rate, user-content affinity, post freshness) and serve them from Redis for low-latency scoring
- Support A/B testing by assigning users to experiment groups at the API layer and routing to different ranking model versions
- Log impression data (which posts were shown, in which positions) back to Kafka for offline model training and evaluation
3. Hot Content and Celebrity Handling
Celebrity posts and viral content create thundering-herd scenarios that can overwhelm fanout pipelines and cache infrastructure. Interviewers probe your strategies for isolating these hotspots.
Hints to consider:
- Celebrity posts skip the fanout-on-write path entirely and are fetched on-demand during feed assembly, merging with the precomputed timeline
- Cache celebrity outboxes aggressively in Redis with short TTLs so the merge step is fast and does not hit the database
- For viral posts accumulating rapid engagement, use sharded counters to prevent hot-key contention on like and comment counts
- Implement circuit breakers on the fanout pipeline so a single hot post does not consume all worker capacity
4. Real-Time Feed Updates
Users expect new posts to appear without manual refresh. Interviewers evaluate your approach to push-based updates that are efficient at scale.
Hints to consider:
- Use lightweight push notifications over WebSocket connections: send a signal that new content is available rather than the full post payload
- The client receives the hint and fetches the latest feed page, which now includes the new post, giving a seamless pull-to-refresh-free experience
- Publish new-post events to Kafka; a notification service consumes them and pushes hints to connected followers via a Redis Pub/Sub layer
- Rate-limit push notifications to avoid overwhelming clients when a prolific followed account posts rapidly
5. Engagement Processing Pipeline
Likes, comments, and shares generate high write volume and feed back into ranking signals. Interviewers look for an event-driven architecture that decouples engagement writes from the feed read path.
Hints to consider:
- Publish engagement events to Kafka partitioned by post ID; consumer workers update sharded counters in Redis and periodically flush to the database
- The feed read path reads counters from Redis, never hitting the database for engagement data
- Engagement events also feed the ranking feature pipeline, updating author engagement rates and post quality scores used in future feed assemblies
- Ensure all consumers are idempotent so duplicate Kafka deliveries do not inflate counts
Suggested Approach
Step 1: Clarify Requirements
Confirm the scale: daily active users, posts per second, and feed reads per second. Ask about the content mix -- what fraction of the feed is ads and news versus organic posts. Clarify latency targets for feed reads versus post publishing. Determine whether the feed is single-column (mobile) or multi-column (desktop) as this affects the number of items per request. Ask about geographic distribution and whether multi-region deployment is needed.
Step 2: High-Level Architecture
Sketch the core services: Post Service (creates and stores posts in PostgreSQL), Fanout Service (pushes post IDs to follower timeline caches in Redis), Timeline Service (assembles and ranks feeds), Social Graph Service (manages follow relationships), Engagement Service (processes likes, comments, shares), Ad Service (selects and paces sponsored content), and News Service (provides editorial content). Show Kafka as the event backbone. Place Redis caches for timeline heads, post content, engagement counters, and ranking features. Include a CDN for post images and static assets, and an API Gateway for authentication, rate limiting, and A/B group assignment.
Step 3: Deep Dive on Feed Assembly
Walk through a feed request. The Timeline Service reads the user's precomputed timeline cache from Redis (a sorted set of post IDs). It fetches the user's follow list, identifies celebrity accounts, and queries their outboxes for recent posts not yet in the cache. A merge step combines precomputed and on-demand posts by score. The ranking stage loads precomputed features from Redis, applies the scoring model, and produces a ranked list. The re-ranking stage inserts ads (from Ad Service, selected based on user targeting data) at configured intervals and blends news articles. The final list of post IDs is hydrated by fetching full post content, author metadata, and engagement counters from Redis or database. The response includes a pagination cursor for the next page.
Step 4: Address Secondary Concerns
Cover post creation flow: user publishes a post, Post Service writes to PostgreSQL and publishes an event to Kafka, Fanout Service consumes and pushes the post ID to each follower's timeline cache (skipping celebrity-tier accounts). Discuss engagement: events flow through Kafka to sharded Redis counters. Address media handling: images uploaded to S3 and served via CDN with pre-signed URLs. Touch on monitoring: track feed latency percentiles, fanout lag, cache hit rates for timelines and post content, engagement processing throughput, and ad fill rates. Discuss scaling: shard PostgreSQL by user ID for posts, partition Kafka by user ID for fanout and by post ID for engagement, auto-scale fanout workers based on Kafka consumer lag, and horizontally scale Redis clusters.
Related Learning
Deepen your understanding of the patterns used in this problem:
- Ad Click Aggregator -- event streaming and real-time aggregation patterns for engagement counting and ad impression tracking
- Distributed Counters -- sharded counter techniques for like and comment counts at massive scale
- Top-K Videos -- ranking and serving the most relevant content in personalized feeds
- Caching -- multi-tier caching for timeline heads, post content, and ranking features
- Message Queues -- Kafka for fanout pipelines, engagement processing, and search indexing
- CDN -- serving post images and static assets efficiently to a global audience
- Databases -- sharding and storage strategies for posts, social graphs, and engagement data