Design a photo-sharing social media platform where users can upload photos, follow other users, and view a chronological feed of posts from people they follow. Instagram is one of the most widely used social networking applications, serving hundreds of millions of daily active users who expect fast uploads, instantly visible posts, and a smooth infinite-scroll experience.
The core engineering challenge lies in handling two fundamentally different workloads at once: reliably ingesting large media uploads (photos and videos up to several gigabytes) and serving a low-latency, high-throughput personalized feed. Users follow anywhere from a handful to millions of accounts, and the system must deliver a consistent, duplicate-free feed regardless of graph shape. Media must be available globally within seconds of upload, and the feed should load in under 300 milliseconds even during traffic spikes.
Interviewers use this problem to evaluate your ability to design clear API contracts, choose pragmatic caching strategies, reason about fan-out trade-offs for celebrity accounts, and implement robust cursor-based pagination. You should be prepared to abstract the follow graph (assume it exists as a service) and focus on the client-visible data flow rather than low-level infrastructure.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to see how you generate the personalized feed at scale, especially how you handle the fan-out problem for accounts with millions of followers.
Hints to consider:
Large file uploads through application servers create bottlenecks and timeouts. Interviewers expect a direct-to-storage upload pattern with CDN delivery.
Hints to consider:
Users expect to scroll through their feed without seeing duplicates or gaps, even as new posts are published during their session.
Hints to consider:
The read-to-write ratio for a photo-sharing platform is extremely high, making caching critical for meeting latency targets.
Hints to consider:
Confirm the scope of the design: is it photo-only or does it include video? What is the maximum file size (1 GB, 3 GB)? Ask about the expected follower distribution and whether you need to handle celebrity accounts differently. Clarify whether the feed is purely reverse-chronological or includes algorithmic ranking. Verify the latency target for feed loads and the acceptable delay for a new post to appear in followers' feeds. Establish whether you need to support features like stories, likes, or comments, or just focus on the core upload and feed flow.
Sketch a diagram with clients, an API gateway, and three main services: a Post Service for handling uploads and metadata, a Feed Service for generating and serving personalized feeds, and a User/Follow Service (treat as external). Media uploads bypass application servers via pre-signed URLs to object storage, with a CDN in front for global delivery. An event bus (Kafka) connects the Post Service to a Fan-Out Service that writes new post references into per-user feed caches in Redis. The Feed Service reads from these caches, falling back to a database query for users without cached feeds. Include a media processing pipeline that handles thumbnails and transcodes asynchronously.
Walk through what happens when a user creates a new post. The Post Service writes metadata to the database and publishes a "post_created" event to Kafka. The Fan-Out Service consumes this event, queries the follow graph for the poster's follower list, and writes the post reference (post ID + timestamp) into each follower's feed cache in Redis using sorted sets. For celebrity accounts (followers exceeding a threshold like 100,000), skip the fan-out entirely and mark the post for on-read merging. When a follower requests their feed, the Feed Service reads the precomputed sorted set from Redis and merges in recent posts from any celebrity accounts the user follows, deduplicating by post ID. Discuss how this hybrid approach bounds the worst-case write amplification while keeping read latency predictable.
Cover how you handle media uploads reliably using multipart uploads with pre-signed URLs and retry logic for failed parts. Explain the asynchronous processing pipeline: once the upload completes, an event triggers thumbnail generation and optional video transcoding, with the post becoming visible only after processing finishes. Discuss monitoring metrics like fan-out lag, cache hit rates, and upload success rates. Address failure scenarios: what happens if Redis loses data (rebuild from database), if Kafka has consumer lag (feeds are eventually consistent), or if the CDN has a cache miss (fall back to origin). Touch on rate limiting for uploads and API calls to prevent abuse.