Problem Statement
Design a social media platform similar to Twitter where users post short text updates, follow other accounts, and consume a fast, personalized home timeline. The platform also incorporates product discovery features, allowing users to tag products in posts, view product recommendations in their feed, and complete purchases within the app.
The core engineering challenge is building a low-latency, read-heavy feed system that blends organic social content with shoppable product posts while handling massive fanout for popular accounts. You must reason about hybrid fanout strategies (write vs. read), real-time update propagation, ranking and relevance, and the integration of a reliable e-commerce checkout flow with inventory management and payment processing. The system should support hundreds of millions of daily active users with strict latency requirements on the feed path.
Key Requirements
Functional
- Post creation -- users can publish short text updates with optional media attachments and product tags
- Home timeline -- users see a personalized, ranked feed blending posts from followed accounts with recommended and sponsored product content
- Search and discovery -- users can search for posts, accounts, and products, and explore trending topics and product categories
- Shopping integration -- users can view product details in-app, add items to a cart, and complete checkout with payment processing and order confirmation
Non-Functional
- Scalability -- support 500 million daily active users with 100,000+ posts per second and 1 million+ feed reads per second at peak
- Reliability -- 99.99% availability for timeline reads; zero lost posts or double-charged payments
- Latency -- home timeline loads in under 200ms at p95; post publishing confirms within 500ms
- Consistency -- eventual consistency for timeline materialization and engagement counts; strong consistency for inventory and payment operations
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Feed Fanout Strategy
The central design decision is how posts reach followers' timelines. Interviewers want to see that you understand why a single approach fails at scale and can articulate a hybrid strategy.
Hints to consider:
- Fanout-on-write precomputes timelines by pushing each new post into all followers' timeline caches at write time, giving fast reads but causing massive write amplification for accounts with millions of followers
- Fanout-on-read assembles the timeline at request time by querying each followed account's outbox, avoiding write amplification but shifting cost to the read path
- A hybrid approach uses fanout-on-write for normal users and fanout-on-read (or deferred fanout) for celebrity accounts, combining the strengths of both
- Maintain a per-user timeline cache in Redis as a sorted set of post IDs; merge in celebrity posts at read time using a lightweight merge step
2. Timeline Ranking and Content Blending
A pure chronological feed ignores relevance and product discovery goals. Interviewers expect you to outline how organic posts, product recommendations, and sponsored content are ranked and interleaved.
Hints to consider:
- Use a two-stage ranking pipeline: candidate retrieval (pull recent posts from followed accounts plus product recommendations) followed by a scoring model that weighs recency, engagement signals, and user affinity
- Insert sponsored content and product posts at defined positions in the feed using pacing rules to avoid overwhelming users
- Precompute ranking features (author engagement rate, user-product affinity scores) and store them in Redis for low-latency scoring at request time
- Support A/B testing on ranking algorithms by partitioning users into experiment groups and logging feature vectors for offline analysis
3. E-Commerce Transaction Integrity
When a user purchases a product from a post, the system must handle inventory reservation, payment authorization, and order creation as a coordinated workflow. Interviewers look for saga patterns and idempotency.
Hints to consider:
- Model checkout as a multi-step saga: reserve inventory, authorize payment, create order, capture payment, with compensating actions (release inventory, void authorization) for failures at each step
- Assign an idempotency key to each checkout attempt so retries never result in double charges or duplicate orders
- Use short-lived inventory reservations with TTL-based expiration so abandoned carts automatically release held stock
- Separate the product catalog and inventory services from the social feed services to keep domain boundaries clean and allow independent scaling
4. Real-Time Engagement and Notifications
Likes, retweets, replies, and order status updates need to propagate quickly. Interviewers evaluate your event-driven architecture for distributing these signals.
Hints to consider:
- Publish engagement events to Kafka topics partitioned by post ID or user ID, consumed by counters, notification workers, and analytics pipelines
- Use sharded counters in Redis for high-throughput engagement metrics (like counts, retweet counts) to avoid hot-key contention
- Push notifications to online users via WebSocket connections routed through a pub/sub layer; batch notifications for offline users and deliver on reconnection
- Decouple the engagement write path from the feed read path so spikes in likes on a viral post do not degrade timeline latency
5. Search and Product Discovery
Users search for posts, accounts, and products. The search system must handle full-text queries, hashtag lookups, and product facets with low latency.
Hints to consider:
- Index posts and products in Elasticsearch using CDC from the primary databases, with separate indices for text content and product attributes
- Support typeahead suggestions using a prefix trie or dedicated autocomplete index for account names and trending terms
- Filter product search results by availability, price range, and category using Elasticsearch aggregations
- Cache frequent search queries in Redis with short TTLs to reduce load on the search cluster during trending events
Suggested Approach
Step 1: Clarify Requirements
Confirm the scale with your interviewer. Ask about the ratio of posts to reads (typically 1:1000+), whether product purchases are a primary use case or secondary feature, and the expected follower distribution (how many celebrity accounts with millions of followers). Clarify latency targets for timeline reads versus checkout operations. Determine whether real-time timeline updates (push new posts while the user scrolls) are required or if pull-to-refresh is sufficient. Ask about geographic distribution and multi-region requirements.
Step 2: High-Level Architecture
Sketch the core services: a Post Service for creating and storing posts, a Timeline Service for assembling and caching personalized feeds, a Social Graph Service managing follow relationships, a Product Catalog Service for product listings and inventory, an Order Service handling checkout workflows, and a Search Service backed by Elasticsearch. Show Kafka as the event backbone connecting post creation to fanout workers, engagement events to counter updaters, and order state changes to notification delivery. Place Redis caches in front of timelines, product details, and engagement counters. Include an API Gateway for authentication, rate limiting, and request routing.
Step 3: Deep Dive on Timeline Assembly
Walk through how a user loads their home feed. The Timeline Service checks the user's precomputed timeline cache in Redis (a sorted set of post IDs with scores based on ranking). For normal followed accounts, posts were pushed here by fanout workers when published. For celebrity accounts, the service queries their recent outbox and merges those posts into the cached set on the fly. A lightweight ranking pass re-scores the top N candidates using precomputed features (engagement rate, recency, user affinity) and interleaves product recommendations and sponsored posts at configured positions. The final ordered list of post IDs is hydrated by fetching full post content and author metadata from cache or database, and returned to the client with a cursor for pagination.
Step 4: Address Secondary Concerns
Cover checkout by walking through the saga: user adds product to cart, proceeds to checkout, the Order Service reserves inventory (atomic decrement with version check), calls the payment provider with an idempotency key, and on success creates the order record and publishes a confirmation event. On payment failure, the compensation step releases the inventory reservation. Discuss search indexing by describing CDC from Postgres to Elasticsearch for posts and products. Address abuse prevention: rate limiting posts and purchases per user, spam detection consuming the Kafka post stream, and bot detection on login. Touch on monitoring: track timeline latency percentiles, fanout lag, cache hit rates, search indexing delay, checkout success rates, and inventory discrepancy alerts.
Related Learning
Deepen your understanding of the patterns used in this problem:
- Ad Click Aggregator -- event streaming and real-time aggregation patterns applicable to engagement counting and trending topics
- Distributed Counters -- techniques for maintaining accurate like and retweet counts at massive scale without hot-key contention
- Payment System -- multi-step transactional workflows with idempotency and saga patterns for the checkout flow
- Caching -- cache-aside and write-through strategies for timeline caches, product details, and engagement counters
- Message Queues -- Kafka for fanout pipelines, notification delivery, and search indexing
- Databases -- sharding strategies for social graphs, post storage, and order data