Design Twitter — Brex

Problem Statement

Design a social media microblogging platform where users can post short messages, follow other users, and consume a personalized timeline of content. Think of Twitter (now X) as the canonical example: hundreds of millions of users generating short-form posts that need to be delivered to followers in near real-time, with features like likes, retweets, search, and trending topics.

The core engineering challenge is building a feed system that can handle extreme read-to-write ratios while delivering fresh content with low latency. A single celebrity post might need to reach tens of millions of followers within seconds, while the vast majority of users have modest follower counts. This asymmetry forces you to make careful trade-offs between precomputing timelines (fanout-on-write) and assembling them on demand (fanout-on-read). Layer in e-commerce or product recommendation features and you add transactional workflows on top of a real-time social graph.

Interviewers use this question to evaluate whether you can reason about feed generation strategies, handle hot-key problems with high-follower accounts, design for horizontal scalability, and integrate multiple domains (social, commerce, search) without creating a monolithic mess.

Key Requirements

Functional

Post creation -- users can publish short text posts with optional media attachments, hashtags, and product tags
Follow graph -- users can follow and unfollow other accounts, creating a directed social graph that drives content delivery
Home timeline -- each user sees a personalized feed of posts from accounts they follow, ordered by relevance or recency
Engagement actions -- users can like, retweet, reply to posts, and view engagement counts in near real-time
Search and discovery -- users can search for posts, accounts, and trending topics, with results ranked by relevance and freshness

Non-Functional

Scalability -- support 500M+ registered users, 200M daily active users, and 500K new posts per minute during peak hours
Latency -- home timeline loads in under 200ms at p95; new posts appear in followers' feeds within 5 seconds for active users
Availability -- 99.99% uptime for read paths (timelines, profiles); 99.9% for write paths (posting, engagement)
Consistency -- eventual consistency acceptable for follower counts and like counts; strong consistency required for post creation acknowledgment and any transactional commerce flows

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Feed Generation Strategy and Fanout Trade-offs

The fundamental design decision is how timelines get assembled. Fanout-on-write precomputes timelines at post time, giving fast reads but creating write amplification for popular accounts. Fanout-on-read computes timelines at request time, which is simpler for writes but can be slow for users who follow many accounts.

Hints to consider:

Use a hybrid approach where posts from ordinary users are fanned out to follower timelines at write time, but posts from high-follower accounts are merged in at read time
Define a follower threshold (e.g., 10,000 followers) to separate "regular" from "celebrity" fanout paths
Store precomputed timelines in Redis sorted sets keyed by user ID, with post IDs as members and timestamps as scores
Consider how timeline caches are invalidated or trimmed -- you only need the most recent few hundred entries per user

2. Hot Partition and Celebrity User Handling

A single user with millions of followers creates a hotspot: their post triggers millions of cache writes or, on the read path, a single post record gets fetched millions of times per second. Neither path works well without special handling.

Hints to consider:

Cache celebrity posts in a dedicated tier with high replication factor so read requests are spread across many nodes
Use request coalescing or batch reads so multiple concurrent timeline requests for the same celebrity post are collapsed into one backend fetch
Shard the follower list for high-degree users across multiple partitions to parallelize fanout writes
Apply rate limiting on the fanout pipeline itself to prevent a single viral post from starving resources for other users

3. Real-Time Delivery and Notification Pipeline

Users expect to see new posts from followed accounts within seconds. Beyond the feed, there are push notifications, in-app alerts for likes and retweets, and trending topic aggregation -- all of which need event-driven architectures.

Hints to consider:

Publish every post, like, and retweet as an event to Kafka, partitioned by author ID for ordering guarantees
Timeline updater consumers process these events and push post IDs into follower timeline caches
Use WebSocket connections or Server-Sent Events for users currently online to receive instant updates without polling
Trending topics can be computed with a sliding window counter over hashtag events, using a stream processing framework

4. Search Indexing and Discovery

Full-text search over hundreds of billions of posts requires careful indexing. Posts need to be searchable within seconds of creation, and results should blend recency with relevance.

Hints to consider:

Index posts into Elasticsearch clusters with time-based indices (daily or weekly) so older data can be moved to cheaper storage
Use change data capture from the posts database to feed the search index, ensuring near-real-time freshness
Implement a two-phase ranking: first retrieve candidates by text match and recency, then re-rank by engagement signals
Consider separate indexes for user profiles and hashtag trends to keep query latency predictable

Suggested Approach

Step 1: Clarify Requirements

Ask your interviewer about the user scale (hundreds of millions or smaller?), the average and maximum follower counts, whether the feed should be strictly chronological or algorithmically ranked, and which features are in scope (search? direct messages? commerce integration?). Confirm latency expectations for timeline loads and new post visibility. Ask about geographic distribution -- single region or global deployment.

Step 2: High-Level Architecture

Sketch the main components: a Post Service that accepts new posts and writes to a posts database (DynamoDB or a sharded MySQL cluster), a Fanout Service that reads from a Kafka topic and pushes post IDs into follower timeline caches in Redis, a Timeline Service that assembles the feed from cache (with fallback to database for cache misses), a User/Graph Service managing follow relationships in a graph-optimized store, a Search Service backed by Elasticsearch, and an API Gateway handling authentication, rate limiting, and routing. Show Kafka as the event backbone connecting post creation to fanout, search indexing, trending computation, and notification delivery.

Step 3: Deep Dive on Timeline Assembly

Walk through a concrete scenario: User A (50K followers) publishes a post. The Post Service writes the record to the posts database and emits a PostCreated event to Kafka. The Fanout Service consumes the event, retrieves User A's follower list from the Graph Service, and for each follower, pushes the post ID with its timestamp into that follower's Redis sorted set. If a follower is online (tracked via a presence service), a WebSocket push is triggered. When User B opens their home timeline, the Timeline Service reads from their Redis sorted set, hydrates the post IDs by fetching post objects from a cache layer backed by the posts database, and merges in recent posts from any celebrity accounts User B follows (fetched on-read). The merged result is returned sorted by timestamp. Discuss cache eviction (keep only the latest 800 entries per timeline), what happens when Redis is down (fall back to pulling recent posts from followed accounts via database), and how you handle deletes (lazy removal from timelines via a background job).

Step 4: Address Secondary Concerns

Cover media handling (upload images/videos to S3 or a blob store, serve via CDN with signed URLs), abuse prevention (rate limit posts per user, spam detection pipeline consuming the Kafka stream), analytics and monitoring (track timeline latency percentiles, fanout lag, cache hit rates, and search indexing delay). Discuss cost optimization: cold storage for posts older than a year, tiered caching where only active users' timelines are kept warm, and auto-scaling fanout workers based on Kafka consumer lag.

Related Learning

Deepen your understanding of the patterns used in this problem:

Ad Click Aggregator -- event streaming and real-time aggregation patterns applicable to engagement counting and trending topics
Distributed Counters -- techniques for maintaining accurate counts (likes, retweets) at massive scale
Caching -- cache-aside, write-through, and eviction strategies critical for timeline and post caching
Message Queues -- event-driven architectures using Kafka for fanout, notifications, and search indexing
Databases -- sharding strategies and consistency models for the posts and graph stores
CDN -- serving media content efficiently to a global user base

Problem Statement

Key Requirements

Functional

Post creation -- users can publish short text posts with optional media attachments, hashtags, and product tags
Follow graph -- users can follow and unfollow other accounts, creating a directed social graph that drives content delivery
Home timeline -- each user sees a personalized feed of posts from accounts they follow, ordered by relevance or recency
Engagement actions -- users can like, retweet, reply to posts, and view engagement counts in near real-time
Search and discovery -- users can search for posts, accounts, and trending topics, with results ranked by relevance and freshness

Non-Functional

Scalability -- support 500M+ registered users, 200M daily active users, and 500K new posts per minute during peak hours
Latency -- home timeline loads in under 200ms at p95; new posts appear in followers' feeds within 5 seconds for active users
Availability -- 99.99% uptime for read paths (timelines, profiles); 99.9% for write paths (posting, engagement)
Consistency -- eventual consistency acceptable for follower counts and like counts; strong consistency required for post creation acknowledgment and any transactional commerce flows

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Feed Generation Strategy and Fanout Trade-offs

Hints to consider:

Use a hybrid approach where posts from ordinary users are fanned out to follower timelines at write time, but posts from high-follower accounts are merged in at read time
Define a follower threshold (e.g., 10,000 followers) to separate "regular" from "celebrity" fanout paths
Store precomputed timelines in Redis sorted sets keyed by user ID, with post IDs as members and timestamps as scores
Consider how timeline caches are invalidated or trimmed -- you only need the most recent few hundred entries per user

2. Hot Partition and Celebrity User Handling

Hints to consider:

Cache celebrity posts in a dedicated tier with high replication factor so read requests are spread across many nodes
Use request coalescing or batch reads so multiple concurrent timeline requests for the same celebrity post are collapsed into one backend fetch
Shard the follower list for high-degree users across multiple partitions to parallelize fanout writes
Apply rate limiting on the fanout pipeline itself to prevent a single viral post from starving resources for other users

3. Real-Time Delivery and Notification Pipeline

Hints to consider:

Publish every post, like, and retweet as an event to Kafka, partitioned by author ID for ordering guarantees
Timeline updater consumers process these events and push post IDs into follower timeline caches
Use WebSocket connections or Server-Sent Events for users currently online to receive instant updates without polling
Trending topics can be computed with a sliding window counter over hashtag events, using a stream processing framework

4. Search Indexing and Discovery

Full-text search over hundreds of billions of posts requires careful indexing. Posts need to be searchable within seconds of creation, and results should blend recency with relevance.

Hints to consider:

Index posts into Elasticsearch clusters with time-based indices (daily or weekly) so older data can be moved to cheaper storage
Use change data capture from the posts database to feed the search index, ensuring near-real-time freshness
Implement a two-phase ranking: first retrieve candidates by text match and recency, then re-rank by engagement signals
Consider separate indexes for user profiles and hashtag trends to keep query latency predictable

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Timeline Assembly

Step 4: Address Secondary Concerns

Related Learning

Deepen your understanding of the patterns used in this problem:

Ad Click Aggregator -- event streaming and real-time aggregation patterns applicable to engagement counting and trending topics
Distributed Counters -- techniques for maintaining accurate counts (likes, retweets) at massive scale
Caching -- cache-aside, write-through, and eviction strategies critical for timeline and post caching
Message Queues -- event-driven architectures using Kafka for fanout, notifications, and search indexing
Databases -- sharding strategies and consistency models for the posts and graph stores
CDN -- serving media content efficiently to a global user base