Design a photo-sharing social media platform where users can upload photos, follow other users, and view a chronological feed of posts from people they follow. Think of Instagram, where hundreds of millions of users scroll through personalized feeds daily, double-tap to like, and share moments through images and short videos.
The core technical challenge is building a system that handles massive media uploads reliably while serving low-latency, personalized feeds at scale. Users expect their posts to appear almost instantly in followers' feeds, uploads of large media files to succeed even on flaky connections, and infinite scrolling to feel seamless without duplicates or gaps. You need to balance write amplification from fan-out strategies against read latency, design efficient pagination for feeds that change constantly, and ensure media delivery through a global CDN keeps load times minimal.
At Roblox-scale, interviewers want to see how you handle the tension between precomputed feeds for fast reads and on-demand assembly for users with millions of followers. They also look for clear API contracts, practical caching strategies, and robust handling of edge cases like celebrity accounts and bursty traffic.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to see how you balance precomputed feeds against on-demand assembly, especially when some users have millions of followers and fan-out on write becomes prohibitively expensive.
Hints to consider:
Large file uploads over unreliable networks require careful design. Interviewers probe how you handle partial failures, retries, and asynchronous processing without blocking the user experience.
Hints to consider:
Infinite scroll must work reliably even as the feed changes underneath the user. Interviewers look for cursor-based pagination that avoids duplicates and gaps.
Hints to consider:
Serving billions of images and videos globally with low latency requires a thoughtful content delivery strategy that goes beyond simply saying "use a CDN."
Hints to consider:
Confirm the scope with the interviewer: What types of media are supported (photos only, or also videos and stories)? Is the feed strictly reverse-chronological or does it incorporate ranking signals? How many followers can a single user have, and what is the expected distribution? Are features like direct messaging, explore/discover, or stories in scope? What are the latency SLAs for feed loads and post visibility? Clarify whether you should design the follow graph or assume it exists as a service.
Sketch the main components: a client layer (mobile/web apps), an API gateway handling authentication and routing, a Post Service managing CRUD operations for posts and media metadata, a Feed Service responsible for generating and serving personalized feeds, a Media Service handling uploads and processing, and a Notification Service for push alerts. Use PostgreSQL or DynamoDB for post metadata and user profiles, Redis for precomputed feed caches, Kafka for event streaming (new posts, follow changes), S3 for media storage, and a multi-region CDN for delivery. Show the write path (upload media, create post, fan-out to feeds) and the read path (fetch cached feed, merge celebrity posts, paginate).
Walk through the lifecycle of a new post. When a user creates a post, the Post Service persists metadata and publishes an event to Kafka. A Fan-Out Service consumes this event, looks up the poster's follower list, and for each follower pushes the post ID into their Redis sorted set (keyed by follower user ID, scored by timestamp). For celebrity accounts (those exceeding a follower threshold, say 500K), skip the fan-out entirely and mark the post for on-read merging. When a user requests their feed, the Feed Service reads their precomputed feed from Redis, then queries the Post Service for recent posts from any celebrity accounts they follow, merges and deduplicates the results by timestamp, and returns the top N with a cursor token. Discuss how you handle feed cache eviction (TTL plus lazy rebuild), follower count changes (reclassifying users), and edge cases like a user following 10,000 accounts.
Cover media processing: uploads go directly to S3 via presigned URLs, triggering a Lambda or worker that generates thumbnails and transcoded versions, then updates post metadata with CDN URLs. Discuss monitoring metrics like feed generation latency, fan-out lag, upload success rates, and CDN cache hit ratios. Address reliability: use dead-letter queues for failed fan-out events, implement circuit breakers around the Feed Service's Redis dependency, and fall back to database-backed feed generation if cache is unavailable. For cost optimization, archive old feed entries from Redis to cold storage after 30 days and use tiered storage for infrequently accessed media.
Deepen your understanding of the patterns used in this problem: