Practice/Lyft/Design Meta News Feed
Design Meta News Feed
System DesignOptional
Problem Statement
Design the backend system powering a social media news feed similar to Facebook's or Instagram's home feed. Users post content (text, images, links) and see a personalized, ranked feed of posts from friends, pages, and groups they follow. The feed must balance recency with relevance, surfacing the most engaging content while ensuring users do not miss important updates from close connections.
This question appears in Lyft interviews because it tests a wide range of system design skills: data modeling for social graphs, fan-out strategies for content distribution, ranking and personalization at scale, and real-time update propagation. The interviewer wants to see how you make principled trade-offs between push and pull architectures, handle the asymmetry between users who follow thousands of accounts and those with small networks, and design a ranking pipeline that can evolve without requiring full system rewrites.
The core challenge is that every user's feed is unique and must be assembled from potentially thousands of content sources, ranked by a personalization model, and served with sub-second latency. The system must handle both celebrity users whose posts reach millions of followers and ordinary users whose posts go to a handful of friends.
Key Requirements
Functional
- Post creation -- Users can publish posts containing text, images, and links that become visible to their followers
- Personalized feed generation -- Each user sees a ranked feed combining posts from friends, followed pages, and groups, ordered by a relevance model rather than pure chronology
- Real-time updates -- New posts from followed accounts appear in the feed within seconds, with optional push notifications for high-priority content
- Pagination and infinite scroll -- The feed supports efficient cursor-based pagination as users scroll through hundreds of posts without performance degradation
- Interaction signals -- Users can like, comment, and share posts, and these engagement signals feed back into the ranking model
Non-Functional
- Scalability -- Support 500 million daily active users, each following an average of 200 accounts, with 10 million new posts per hour
- Latency -- Serve a personalized feed page in under 300ms at p95, including ranking computation
- Availability -- 99.99% uptime for feed reads; feed generation should degrade gracefully rather than fail completely during partial outages
- Freshness -- New posts appear in followers' feeds within 30 seconds of publication for active users
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Fan-Out Strategy: Push vs. Pull vs. Hybrid
The fundamental architectural decision is how content reaches followers. Interviewers want to see you reason through the trade-offs and arrive at a hybrid approach that handles both celebrity and regular users efficiently.
Hints to consider:
- Push (fan-out on write): when a user posts, immediately write the post ID to all followers' feed caches. This is fast for reads but expensive for users with millions of followers
- Pull (fan-out on read): when a user opens their feed, query all followed accounts for recent posts and merge them. This avoids write amplification but increases read latency
- Hybrid approach: use push for regular users (under 10,000 followers) and pull for celebrity accounts, merging both sources at read time
- Consider the storage cost of maintaining per-user feed caches for hundreds of millions of users
2. Feed Ranking and Personalization
A chronological feed is simple but leads to poor engagement. Interviewers expect you to design a ranking pipeline that scores posts based on multiple signals and can be iterated on by ML teams.
Hints to consider:
- Implement a two-stage ranking pipeline: a fast candidate retrieval phase that gathers hundreds of eligible posts, followed by a scoring phase that ranks them using a lightweight ML model
- Use features like post recency, author affinity (how often the user interacts with this author), content type preferences, and engagement velocity (likes per minute since posting)
- Cache ranking model outputs and pre-compute author affinity scores offline to minimize real-time computation
- Design the ranking service as a separate microservice so ML teams can deploy model updates independently of the feed infrastructure
3. Storage and Data Model for Social Graph and Posts
The feed system sits on top of a social graph (who follows whom) and a content store (posts and media). Interviewers want to see efficient data modeling that supports both high-throughput writes and low-latency reads.
Hints to consider:
- Store the social graph in a graph database or adjacency list in a wide-column store like Cassandra, partitioned by user ID for fast follower lookups
- Store posts in a separate content store with the post ID as the primary key and denormalized author metadata to avoid joins at read time
- Use a pre-materialized feed cache (Redis sorted sets keyed by user ID, scored by timestamp) for push-based feeds
- Separate hot storage (recent 7 days of posts) from cold storage (older posts) to optimize cache hit rates
4. Handling Celebrity Users and Viral Content
Users with millions of followers create extreme fan-out that can overwhelm the write path. Viral posts that accumulate thousands of likes per second create hot keys in the engagement store.
Hints to consider:
- Classify users into tiers based on follower count and use different fan-out strategies per tier
- For viral posts, shard engagement counters across multiple keys and aggregate them asynchronously
- Implement rate limiting on the write path to prevent a single celebrity post from consuming all fan-out worker capacity
- Use dedicated infrastructure or priority queues for high-follower-count accounts to isolate their impact