Problem Statement
Design a social media platform similar to Twitter where users post short text updates, follow other accounts, and consume a personalized home timeline. The twist in this variant (asked at Asana, Amazon, and others) is that the platform also incorporates product discovery and shopping features: users can attach product links to tweets, browse recommended products in their feed, and complete purchases without leaving the app.
The core engineering challenges are building a low-latency feed that merges social content with product recommendations, handling extreme read amplification from celebrity accounts with millions of followers, orchestrating reliable e-commerce checkout flows alongside social interactions, and maintaining crisp user experience across two very different domains. You will need to reason about fanout strategies, timeline materialization, caching, saga-based payment workflows, and domain boundary separation.
Key Requirements
Functional
- Posting -- users create short text updates with optional media attachments and product tags
- Following and timeline -- users follow accounts and see a personalized, near-real-time home timeline mixing followed content with recommended products
- Search and discovery -- users search for tweets and products, view product details in-app, and browse trending topics
- Shopping checkout -- users add products to a cart, complete payment securely, and receive order confirmation and delivery status notifications
Non-Functional
- Scalability -- support 500 million daily active users, 500 million new tweets per day, and a read-to-write ratio exceeding 100:1
- Reliability -- guarantee no lost tweets, no double charges, and no inventory oversells; tolerate datacenter failures
- Latency -- home timeline loads in under 200 ms at p95; tweet posting completes in under 500 ms; checkout confirmation in under two seconds
- Consistency -- eventual consistency acceptable for timeline and follower counts; strong consistency required for payment and inventory operations
What Interviewers Focus On
Based on real interview experiences at Asana, Cloudflare, Brex, Amazon, and Snapchat, these are the areas interviewers probe most deeply:
1. Timeline Fanout Strategy
The most common deep-dive area. Celebrity accounts with millions of followers create massive write amplification under fanout-on-write, while fanout-on-read adds latency for every timeline load. Interviewers want to see a hybrid approach with clear reasoning.
Hints to consider:
- Use fanout-on-write for regular users: when a user posts, a worker pushes the tweet ID into each follower's precomputed timeline in a cache (Redis sorted set keyed by user ID, scored by timestamp)
- For high-follower accounts (celebrities, brands), skip the write fanout; instead, merge their tweets into the timeline at read time by querying a small "celebrity tweets" index
- Define a follower-count threshold (for example, 10,000) that triggers the switch from write to read fanout
- Precompute timelines for only active users (those who logged in within the last seven days) to avoid wasting resources on dormant accounts
2. Feed Ranking and Product Recommendation Injection
The timeline is not purely chronological -- it blends social posts with product recommendations. Interviewers probe how you merge these two data sources without degrading latency.
Hints to consider:
- Fetch the raw timeline (precomputed tweet IDs) and a small batch of recommended product IDs from a separate recommendation service in parallel
- Apply a lightweight ranking model at the edge that interleaves social posts and products based on engagement signals, recency, and user affinity
- Cache ranked feed pages for short TTLs (30-60 seconds) so repeated scrolls do not re-invoke the ranker
- Keep the recommendation service decoupled from the social graph; it consumes engagement events from Kafka and maintains its own feature store
3. E-Commerce Checkout Workflow
Adding shopping to a social platform introduces multi-step transactional workflows that must not interfere with the low-latency social experience. Interviewers expect saga-based orchestration.
Hints to consider:
- Model checkout as a saga: reserve inventory, authorize payment, create order, send confirmation; each step has a compensating rollback action
- Use idempotency keys on all payment-provider API calls to make retries safe
- Store cart state in Redis with TTL-based expiry for abandoned carts; persist confirmed orders in PostgreSQL
- Isolate the commerce write path from the social write path so a payment-provider outage does not degrade timeline performance
4. Storage and Caching Architecture
Serving sub-200 ms timelines at massive scale demands aggressive caching and careful data modeling. Interviewers assess your technology choices and cache invalidation strategy.
Hints to consider:
- Store tweet metadata in Amazon DynamoDB or Cassandra for high write throughput and predictable read latency, partitioned by tweet ID
- Cache precomputed timelines in Redis sorted sets; evict entries older than a configured window (for example, two weeks)
- Use a separate Elasticsearch cluster for tweet and product search, updated asynchronously via change data capture from the primary store
- Maintain hot counters (likes, retweets, view counts) in Redis and periodically flush to durable storage to avoid write amplification on the primary database
5. Domain Separation Between Social and Commerce
Tightly coupling tweet metadata with product catalog data leads to schema complexity and performance interference. Interviewers want clean boundaries.
Hints to consider:
- Treat social (tweets, follows, timelines) and commerce (products, carts, orders, inventory) as separate bounded contexts with independent databases
- Join the two at the application layer: a tweet references a product ID, and the client fetches product details from the commerce API when rendering
- Use Kafka as the integration backbone: tweet-engagement events flow to the recommendation service; order-completion events flow to the social notification service
- Avoid sharing database schemas or tables across the two domains to enable independent scaling and deployment
Suggested Approach
Step 1: Clarify Requirements
Confirm whether the scope is a full Twitter clone with shopping or a simpler microblog. Ask about scale: daily active users, tweets per day, and read-to-write ratio. Clarify the product catalog size and whether inventory is managed by the platform or by third-party sellers. Determine if the recommendation engine is in scope or a black box. Verify latency and consistency expectations for timeline versus checkout.
Step 2: High-Level Architecture
Sketch the main components split across two domains. Social domain: API gateway, tweet ingestion service, fanout workers, Redis timeline cache, DynamoDB for tweet storage, Elasticsearch for search, and a follower graph service. Commerce domain: product catalog service, cart service (Redis-backed), order service, payment gateway integration, and inventory service (PostgreSQL). Shared infrastructure: Kafka event bus connecting the two domains, a recommendation service consuming engagement events, and a notification service for both social alerts and order updates.
Step 3: Deep Dive on Timeline Fanout
Walk through what happens when a user posts a tweet. The tweet is written to DynamoDB and published to Kafka. A fanout worker consumes the event, checks the poster's follower count. If below the threshold, it iterates over followers (fetched from the graph service in batches), and for each active follower pushes the tweet ID into their Redis timeline sorted set. If the poster is a celebrity, the worker skips the fanout. When a user loads their timeline, the API fetches the precomputed list from Redis, then merges in recent tweets from followed celebrities by querying the celebrity tweet index, and finally calls the recommendation service for product insertions. A ranker blends the results and returns a paginated feed.
Step 4: Address Secondary Concerns
Cover checkout: saga with inventory reservation, payment authorization, order creation, and compensation on failure. Discuss search: Elasticsearch indexes updated via change data capture with eventual consistency. Address notifications: Kafka-driven delivery of new-follower, like, and order-status events to mobile push and in-app channels. Mention monitoring: track timeline p95 latency, fanout lag, cache hit rate, payment success rate, and inventory contention. Discuss scaling: add Redis shards for timeline growth, partition DynamoDB by tweet ID, scale fanout workers horizontally, and use read replicas for the commerce database.
Related Learning Resources
- Design Slack -- covers real-time message delivery, channel fanout, and WebSocket infrastructure patterns applicable to timeline updates
- Design a Payment System -- explores saga-based checkout workflows, idempotency, and payment gateway integration relevant to the commerce side