Design Instagram — Amazon

Problem Statement

Design a photo-sharing social media platform where users can upload photos, follow other users, and view a personalized feed of posts from accounts they follow. Users expect fast uploads, instantly visible posts, and a smooth infinite-scroll experience with no duplicates or gaps.

Instagram-style platforms involve two fundamentally different data paths: the creator path (upload media, apply metadata, publish) and the viewer path (discover, scroll feed, interact). At Amazon, interviewers use this to assess whether you can design for large blob handling with reliable uploads, a low-latency high-scale feed, hybrid fanout strategies for celebrity accounts, and robust cursor-based pagination. Expect to abstract the follow graph and focus on client-visible performance rather than infrastructure minutiae.

Key Requirements

Functional

Photo and video upload -- creators upload media files up to 3 GB with resume support for interrupted transfers, progress tracking, and optional captions
Personalized feed -- users view a feed of posts from accounts they follow in reverse chronological order, with smooth infinite scroll
Reliable pagination -- users can page through their feed without duplicates or gaps, even as new posts arrive during browsing
Post details -- users can view a post's media, caption, author info, and engagement metrics quickly after creation

Non-Functional

Scalability -- support hundreds of millions of daily active users with a read-to-write ratio exceeding 100:1; handle celebrity accounts with tens of millions of followers
Reliability -- ensure no uploaded media is lost even during network interruptions; guarantee at-least-once delivery of posts to follower feeds
Latency -- serve feed requests within 300ms at p99; complete uploads with confirmation within seconds for typical photo sizes
Consistency -- eventual consistency acceptable for feed population; strong consistency for post creation and engagement counters

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Feed Generation Strategy

The core architectural decision is how you populate each user's feed. Interviewers want to see if you understand the fundamental tradeoff between fanout-on-write (pre-computing feeds) and fanout-on-read (aggregating on demand), and why a hybrid approach is necessary.

Hints to consider:

Use fanout-on-write for regular users (under 10k followers) to pre-populate follower feeds asynchronously via a message queue
Use fanout-on-read for celebrity accounts to avoid millions of writes per post; merge celebrity content at read time
Store feed entries as lightweight pointers (post IDs with timestamps) rather than denormalized post content to reduce write amplification
Implement stable cursor-based pagination using composite keys (timestamp + post ID) so users never see duplicates during refresh

2. Large Media Upload and Delivery

Photos and videos are large binary objects that should never flow through your application servers. Interviewers look for a clear separation of control plane and data plane.

Hints to consider:

Use pre-signed URLs to let clients upload directly to object storage (S3), keeping app servers out of the data path
Support multipart and resumable uploads so users can recover from network interruptions without restarting
Trigger asynchronous processing (thumbnail generation, format conversion) via events after upload completes
Serve media through a CDN with aggressive caching and content-addressable URLs for cache efficiency

3. Asynchronous Fanout and Event Processing

Publishing a post triggers fan-out to potentially millions of feeds, media processing, cache warming, and notification delivery. All of this must happen without blocking the creator.

Hints to consider:

Use a message queue (Kafka) to decouple post creation from fan-out workers, providing backpressure and retry semantics
Design idempotent fan-out workers so retries don't create duplicate feed entries
Prioritize fan-out to active users over dormant ones to optimize resource usage
Implement dead-letter queues for failed fan-out attempts with monitoring and alerting

4. Caching and Read Path Optimization

Feeds are overwhelmingly read-heavy. Interviewers expect a multi-layer caching strategy that serves most requests without hitting the primary database.

Hints to consider:

Cache pre-computed feed lists in Redis sorted sets keyed by user ID, with TTL-based eviction
Cache post metadata and media URLs separately with longer TTLs since posts are immutable after creation
Implement cache-aside pattern where feed reads check cache first, falling back to database reconstruction on miss
Use batch fetching to hydrate post details for an entire feed page in a single round trip

Suggested Approach

Step 1: Clarify Requirements

Start by confirming scope with your interviewer. Ask about the expected daily active user count, average follow count, and celebrity threshold where behavior changes. Clarify whether the feed is purely chronological or algorithmically ranked. Determine if group posts, stories, or reels are in scope. Confirm media size limits and whether video is required. Establish latency targets for feed loads versus uploads.

Step 2: High-Level Architecture

Sketch the major components: client applications (mobile, web), an API gateway, a post service for creating and retrieving posts, a feed service for generating and serving user timelines, a media service handling upload URLs and processing triggers, and a social graph service managing follow relationships. Show object storage (S3) for media with CDN delivery, a relational or NoSQL database for post metadata, Redis for cached feeds and hot data, and Kafka for event-driven fan-out and media processing pipelines. Draw the two main flows: creator uploads media directly to S3, posts metadata to the post service which publishes to Kafka, and fan-out workers populate follower feeds; readers hit the feed service which reads from Redis cache or reconstructs from storage.

Step 3: Deep Dive on Feed Generation

Walk through the hybrid fan-out approach in detail. When a regular user (under 10k followers) creates a post, publish an event to Kafka partitioned by creator ID. Fan-out workers consume the event, look up the creator's follower list, and write the post reference (post_id, timestamp) into each follower's feed in Redis (ZADD to a sorted set) and the persistent feed table. For celebrity accounts (over 10k followers), skip the fan-out write. Instead, when a user loads their feed, the feed service reads their pre-materialized feed from Redis, then queries a separate celebrity-posts index for recent posts from any celebrities they follow, merges and sorts the combined results, and returns the page. Discuss cursor-based pagination: each response includes a cursor encoding the last item's (timestamp, post_id), and the next request uses that cursor to fetch the subsequent page deterministically.

Step 4: Address Secondary Concerns

Cover media processing by explaining that S3 upload completion events trigger a Kafka message consumed by transcoding workers that generate thumbnails and optimized formats, then update post metadata with media URLs. Discuss reliability through retry logic with exponential backoff for fan-out failures and dead-letter queues for persistent failures. Address monitoring: track fan-out lag, cache hit rates, feed load latency percentiles, and upload success rates. Discuss cache invalidation: when a user unfollows someone, remove that creator's posts from the user's cached feed. Touch on abuse prevention with rate limiting on uploads and posts per user.

Related Learning

Deepen your understanding of the patterns used in this problem:

Yelp -- location-based discovery, search indexing, and feed-style content delivery at scale
Top-K Videos -- aggregating engagement signals and serving ranked content feeds efficiently
CDN -- edge caching and geographic distribution for serving media with low latency
Blob Storage -- object storage patterns for handling large media uploads with pre-signed URLs
Message Queues -- Kafka-based fan-out pipelines and asynchronous media processing workflows

Problem Statement

Key Requirements

Functional

Photo and video upload -- creators upload media files up to 3 GB with resume support for interrupted transfers, progress tracking, and optional captions
Personalized feed -- users view a feed of posts from accounts they follow in reverse chronological order, with smooth infinite scroll
Reliable pagination -- users can page through their feed without duplicates or gaps, even as new posts arrive during browsing
Post details -- users can view a post's media, caption, author info, and engagement metrics quickly after creation

Non-Functional

Scalability -- support hundreds of millions of daily active users with a read-to-write ratio exceeding 100:1; handle celebrity accounts with tens of millions of followers
Reliability -- ensure no uploaded media is lost even during network interruptions; guarantee at-least-once delivery of posts to follower feeds
Latency -- serve feed requests within 300ms at p99; complete uploads with confirmation within seconds for typical photo sizes
Consistency -- eventual consistency acceptable for feed population; strong consistency for post creation and engagement counters

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Feed Generation Strategy

Hints to consider:

Use fanout-on-write for regular users (under 10k followers) to pre-populate follower feeds asynchronously via a message queue
Use fanout-on-read for celebrity accounts to avoid millions of writes per post; merge celebrity content at read time
Store feed entries as lightweight pointers (post IDs with timestamps) rather than denormalized post content to reduce write amplification
Implement stable cursor-based pagination using composite keys (timestamp + post ID) so users never see duplicates during refresh

2. Large Media Upload and Delivery

Photos and videos are large binary objects that should never flow through your application servers. Interviewers look for a clear separation of control plane and data plane.

Hints to consider:

Use pre-signed URLs to let clients upload directly to object storage (S3), keeping app servers out of the data path
Support multipart and resumable uploads so users can recover from network interruptions without restarting
Trigger asynchronous processing (thumbnail generation, format conversion) via events after upload completes
Serve media through a CDN with aggressive caching and content-addressable URLs for cache efficiency

3. Asynchronous Fanout and Event Processing

Publishing a post triggers fan-out to potentially millions of feeds, media processing, cache warming, and notification delivery. All of this must happen without blocking the creator.

Hints to consider:

Use a message queue (Kafka) to decouple post creation from fan-out workers, providing backpressure and retry semantics
Design idempotent fan-out workers so retries don't create duplicate feed entries
Prioritize fan-out to active users over dormant ones to optimize resource usage
Implement dead-letter queues for failed fan-out attempts with monitoring and alerting

4. Caching and Read Path Optimization

Feeds are overwhelmingly read-heavy. Interviewers expect a multi-layer caching strategy that serves most requests without hitting the primary database.

Hints to consider:

Cache pre-computed feed lists in Redis sorted sets keyed by user ID, with TTL-based eviction
Cache post metadata and media URLs separately with longer TTLs since posts are immutable after creation
Implement cache-aside pattern where feed reads check cache first, falling back to database reconstruction on miss
Use batch fetching to hydrate post details for an entire feed page in a single round trip

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Feed Generation

Step 4: Address Secondary Concerns

Related Learning

Deepen your understanding of the patterns used in this problem:

Yelp -- location-based discovery, search indexing, and feed-style content delivery at scale
Top-K Videos -- aggregating engagement signals and serving ranked content feeds efficiently
CDN -- edge caching and geographic distribution for serving media with low latency
Blob Storage -- object storage patterns for handling large media uploads with pre-signed URLs
Message Queues -- Kafka-based fan-out pipelines and asynchronous media processing workflows