Design Instagram — DoorDash

Problem Statement

Design a photo-sharing social media platform where users can upload photos, follow other users, and view a chronological feed of posts from people they follow. Instagram is one of the most widely used social networking applications, serving hundreds of millions of daily active users who expect fast uploads, instantly visible posts, and a smooth infinite-scroll experience.

The core engineering challenge lies in handling two fundamentally different workloads at once: reliably ingesting large media uploads (photos and videos up to several gigabytes) and serving a low-latency, high-throughput personalized feed. Users follow anywhere from a handful to millions of accounts, and the system must deliver a consistent, duplicate-free feed regardless of graph shape. Media must be available globally within seconds of upload, and the feed should load in under 300 milliseconds even during traffic spikes.

Interviewers use this problem to evaluate your ability to design clear API contracts, choose pragmatic caching strategies, reason about fan-out trade-offs for celebrity accounts, and implement robust cursor-based pagination. You should be prepared to abstract the follow graph (assume it exists as a service) and focus on the client-visible data flow rather than low-level infrastructure.

Key Requirements

Functional

Post creation -- Users can upload a photo or video with an optional caption, and the post becomes visible to followers within seconds of upload completion
Personalized feed -- Users see a reverse-chronological feed of posts from accounts they follow, with seamless infinite scroll
Cursor-based pagination -- The feed supports stable pagination so users can continue scrolling without encountering duplicates or missing posts
Post detail view -- Users can view a post's media, caption, and author information quickly after it is created

Non-Functional

Scalability -- Support hundreds of millions of daily active users, with individual accounts having up to tens of millions of followers
Latency -- Serve feed requests in under 300 milliseconds at the 99th percentile
Reliability -- Guarantee that uploads complete successfully even on unreliable networks by supporting resumable uploads
Availability -- Maintain 99.99% uptime for feed reads, tolerating individual component failures gracefully

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Feed Generation Strategy

Interviewers want to see how you generate the personalized feed at scale, especially how you handle the fan-out problem for accounts with millions of followers.

Hints to consider:

Consider a hybrid approach: fan-out-on-write for regular users and fan-out-on-read for celebrity accounts
Precompute feeds in a fast data store like Redis sorted sets so feed reads become simple range queries
Discuss how you detect celebrity accounts and route their posts through the on-read path to avoid write storms
Explain how new follows or unfollows trigger background reconciliation of the precomputed feed

2. Media Upload and Delivery Pipeline

Large file uploads through application servers create bottlenecks and timeouts. Interviewers expect a direct-to-storage upload pattern with CDN delivery.

Hints to consider:

Use pre-signed URLs to let clients upload directly to object storage (Amazon S3, Google Cloud Storage) without proxying through your servers
Support multipart and resumable uploads so users on flaky connections can retry partial uploads
Trigger asynchronous post-processing (thumbnail generation, transcoding) via event notifications from the object store
Serve media through a CDN with appropriate cache headers to minimize origin load and global latency

3. Pagination and Feed Consistency

Users expect to scroll through their feed without seeing duplicates or gaps, even as new posts are published during their session.

Hints to consider:

Implement cursor-based pagination using a composite key of timestamp and post ID rather than offset-based paging
Anchor each session to a snapshot cursor so new posts added during scrolling do not disrupt the current page
Handle edge cases like deleted posts or unfollowed accounts mid-session gracefully
Discuss cache invalidation strategies when the precomputed feed is updated while users are actively reading

4. Caching and Read Path Optimization

The read-to-write ratio for a photo-sharing platform is extremely high, making caching critical for meeting latency targets.

Hints to consider:

Cache precomputed feeds per user in Redis with a bounded size (e.g., last 500 posts) to keep memory usage predictable
Use a write-through pattern where the fan-out process populates the cache directly rather than waiting for a cache miss
Layer a CDN cache for media and a metadata cache for post details to reduce database load
Plan for cache warming strategies when a cache node fails or a new node joins the cluster

Suggested Approach

Step 1: Clarify Requirements

Confirm the scope of the design: is it photo-only or does it include video? What is the maximum file size (1 GB, 3 GB)? Ask about the expected follower distribution and whether you need to handle celebrity accounts differently. Clarify whether the feed is purely reverse-chronological or includes algorithmic ranking. Verify the latency target for feed loads and the acceptable delay for a new post to appear in followers' feeds. Establish whether you need to support features like stories, likes, or comments, or just focus on the core upload and feed flow.

Step 2: High-Level Architecture

Sketch a diagram with clients, an API gateway, and three main services: a Post Service for handling uploads and metadata, a Feed Service for generating and serving personalized feeds, and a User/Follow Service (treat as external). Media uploads bypass application servers via pre-signed URLs to object storage, with a CDN in front for global delivery. An event bus (Kafka) connects the Post Service to a Fan-Out Service that writes new post references into per-user feed caches in Redis. The Feed Service reads from these caches, falling back to a database query for users without cached feeds. Include a media processing pipeline that handles thumbnails and transcodes asynchronously.

Step 3: Deep Dive -- Feed Fan-Out and Hybrid Delivery

Walk through what happens when a user creates a new post. The Post Service writes metadata to the database and publishes a "post_created" event to Kafka. The Fan-Out Service consumes this event, queries the follow graph for the poster's follower list, and writes the post reference (post ID + timestamp) into each follower's feed cache in Redis using sorted sets. For celebrity accounts (followers exceeding a threshold like 100,000), skip the fan-out entirely and mark the post for on-read merging. When a follower requests their feed, the Feed Service reads the precomputed sorted set from Redis and merges in recent posts from any celebrity accounts the user follows, deduplicating by post ID. Discuss how this hybrid approach bounds the worst-case write amplification while keeping read latency predictable.

Step 4: Address Secondary Concerns

Cover how you handle media uploads reliably using multipart uploads with pre-signed URLs and retry logic for failed parts. Explain the asynchronous processing pipeline: once the upload completes, an event triggers thumbnail generation and optional video transcoding, with the post becoming visible only after processing finishes. Discuss monitoring metrics like fan-out lag, cache hit rates, and upload success rates. Address failure scenarios: what happens if Redis loses data (rebuild from database), if Kafka has consumer lag (feeds are eventually consistent), or if the CDN has a cache miss (fall back to origin). Touch on rate limiting for uploads and API calls to prevent abuse.

Related Learning

Caching -- strategies for feed and metadata caching
Message Queues -- event-driven fan-out architecture
CDN -- global media delivery optimization
Blob Storage -- direct-to-storage upload patterns
Load Balancers -- distributing read-heavy traffic
Databases -- choosing between SQL and NoSQL for post metadata

Problem Statement

Key Requirements

Functional

Post creation -- Users can upload a photo or video with an optional caption, and the post becomes visible to followers within seconds of upload completion
Personalized feed -- Users see a reverse-chronological feed of posts from accounts they follow, with seamless infinite scroll
Cursor-based pagination -- The feed supports stable pagination so users can continue scrolling without encountering duplicates or missing posts
Post detail view -- Users can view a post's media, caption, and author information quickly after it is created

Non-Functional

Scalability -- Support hundreds of millions of daily active users, with individual accounts having up to tens of millions of followers
Latency -- Serve feed requests in under 300 milliseconds at the 99th percentile
Reliability -- Guarantee that uploads complete successfully even on unreliable networks by supporting resumable uploads
Availability -- Maintain 99.99% uptime for feed reads, tolerating individual component failures gracefully

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Feed Generation Strategy

Interviewers want to see how you generate the personalized feed at scale, especially how you handle the fan-out problem for accounts with millions of followers.

Hints to consider:

Consider a hybrid approach: fan-out-on-write for regular users and fan-out-on-read for celebrity accounts
Precompute feeds in a fast data store like Redis sorted sets so feed reads become simple range queries
Discuss how you detect celebrity accounts and route their posts through the on-read path to avoid write storms
Explain how new follows or unfollows trigger background reconciliation of the precomputed feed

2. Media Upload and Delivery Pipeline

Large file uploads through application servers create bottlenecks and timeouts. Interviewers expect a direct-to-storage upload pattern with CDN delivery.

Hints to consider:

Use pre-signed URLs to let clients upload directly to object storage (Amazon S3, Google Cloud Storage) without proxying through your servers
Support multipart and resumable uploads so users on flaky connections can retry partial uploads
Trigger asynchronous post-processing (thumbnail generation, transcoding) via event notifications from the object store
Serve media through a CDN with appropriate cache headers to minimize origin load and global latency

3. Pagination and Feed Consistency

Users expect to scroll through their feed without seeing duplicates or gaps, even as new posts are published during their session.

Hints to consider:

Implement cursor-based pagination using a composite key of timestamp and post ID rather than offset-based paging
Anchor each session to a snapshot cursor so new posts added during scrolling do not disrupt the current page
Handle edge cases like deleted posts or unfollowed accounts mid-session gracefully
Discuss cache invalidation strategies when the precomputed feed is updated while users are actively reading

4. Caching and Read Path Optimization

The read-to-write ratio for a photo-sharing platform is extremely high, making caching critical for meeting latency targets.

Hints to consider:

Cache precomputed feeds per user in Redis with a bounded size (e.g., last 500 posts) to keep memory usage predictable
Use a write-through pattern where the fan-out process populates the cache directly rather than waiting for a cache miss
Layer a CDN cache for media and a metadata cache for post details to reduce database load
Plan for cache warming strategies when a cache node fails or a new node joins the cluster

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive -- Feed Fan-Out and Hybrid Delivery

Step 4: Address Secondary Concerns

Related Learning

Caching -- strategies for feed and metadata caching
Message Queues -- event-driven fan-out architecture
CDN -- global media delivery optimization
Blob Storage -- direct-to-storage upload patterns
Load Balancers -- distributing read-heavy traffic
Databases -- choosing between SQL and NoSQL for post metadata