Problem Statement
Design an eBay-like e-commerce marketplace that supports buyers browsing a large product catalog, comparing offers from multiple sellers, managing a shopping cart, completing checkout with payment and shipping, and tracking orders through delivery. The platform must scale to millions of concurrent users with high read traffic on catalog and search pages, handle promotional traffic bursts, and enforce strict correctness for inventory and payment operations.
The core engineering challenges are scaling product discovery with full-text search, faceted filtering, and relevance ranking; preventing inventory overselling when many buyers compete for limited stock; orchestrating multi-step checkout across inventory, tax, shipping, and payment services with idempotency and compensation; and keeping product detail pages fast under heavy load. Your design should clearly separate the read-heavy catalog browsing path from the write-heavy transactional order path so each can scale independently.
Key Requirements
Functional
- Product search and browsing -- users can search by keywords, filter by attributes (category, price, seller rating), and paginate through results with relevance-ranked ordering
- Shopping cart -- users can add, update quantity, and remove items with accurate pricing and real-time availability indicators across sessions and devices
- Checkout and payment -- users can enter shipping details, authorize payment, and receive order confirmation through a reliable multi-step workflow with tax and shipping calculations
- Order management -- users can view order status, track shipments, and request cancellations or returns within policy windows
Non-Functional
- Scalability -- handle tens of millions of concurrent browsing sessions, 100,000+ search queries per second, and 10,000+ checkout attempts per second during peak sales
- Reliability -- zero overselling under any conditions; 99.99% read availability for catalog; 99.9% availability for transactional operations
- Latency -- product detail pages within 150ms; search results within 300ms; checkout completion within 2 seconds
- Consistency -- strong consistency for inventory reservations and payment operations; eventual consistency for search index updates and engagement metrics
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Inventory Consistency Under Contention
The defining challenge of e-commerce design is preventing overselling when many buyers race for limited stock. Interviewers want to see concrete techniques for managing write contention on inventory counters.
Hints to consider:
- Use atomic conditional updates (UPDATE stock SET quantity = quantity - N WHERE quantity >= N) to guarantee no overselling at the database level
- Separate inventory reservation (soft hold with TTL) from finalization (hard commit after payment success) so abandoned checkouts release stock automatically
- For flash-sale scenarios, pre-shard inventory into multiple independent pools to distribute write contention across rows
- Implement monitoring for reservation-to-purchase conversion rates and alert on anomalies that might indicate stuck reservations
2. Saga-Based Checkout Orchestration
Checkout spans multiple services and external providers. Interviewers expect you to design for partial failures with explicit compensation logic rather than relying on distributed transactions.
Hints to consider:
- Define checkout as a saga with ordered steps: validate cart, reserve inventory, calculate shipping and tax, authorize payment, create order record, confirm to user
- Assign idempotency keys to each external call (especially payment authorization) so retries are safe
- Define compensating actions for each step: release inventory reservation if payment fails, void payment authorization if order creation fails
- Persist saga state in a durable store with a background reconciliation worker that resumes or rolls back stuck workflows after a timeout
3. Catalog Read Scaling
Product browsing and search are read-dominant and must remain fast even during promotional spikes. The transactional database should not serve these queries directly.
Hints to consider:
- Maintain a dedicated Elasticsearch index with product attributes, seller information, and facet values, updated via CDC from the product database
- Cache hot product detail pages in Redis with background refresh and short TTLs; serve category landing pages from CDN edge
- Separate the catalog read service from the inventory and order services at the infrastructure level so read scaling decisions do not impact transactional paths
- Use pre-aggregated category counts and price range histograms for fast faceted navigation without scanning the full index
4. Cart Design and Price Accuracy
The cart must be durable across sessions while reflecting current prices and stock levels at checkout time to prevent stale-data surprises.
Hints to consider:
- Store active carts in Redis keyed by user ID with periodic persistence to the database for recovery
- Re-validate prices and availability at checkout initiation; display clear messaging if anything changed since the item was added
- Defer hard inventory reservation until checkout to avoid holding stock unnecessarily for browsing users
- Support guest-to-authenticated cart merging when anonymous users log in
5. Post-Purchase Order Lifecycle
Orders move through fulfillment states, and users expect visibility into each transition. Cancellations and returns add transactional complexity with inventory and payment side effects.
Hints to consider:
- Model orders as state machines: Pending, Confirmed, Shipped, Delivered, Cancelled, Returned, with validated transitions
- Publish state change events to Kafka for downstream consumers: notification service, seller dashboard, inventory adjustment, and analytics
- Cancellation triggers compensating actions: refund initiation via the payment provider, inventory restock, shipment hold if not yet dispatched
- Ensure all event consumers are idempotent to handle duplicate deliveries from Kafka without corrupting state
Suggested Approach
Step 1: Clarify Requirements
Ask about catalog size, expected traffic, and peak checkout rates. Clarify whether this is a multi-seller marketplace or single-seller platform. Determine if time-limited promotions or auctions are in scope. Confirm geographic distribution needs and whether multi-region deployment is required. Ask about the depth of payment integration -- authorization/capture only, or full refund and dispute handling.
Step 2: High-Level Architecture
Sketch the services: Product Catalog Service (Elasticsearch + Redis, read-optimized), Inventory Service (PostgreSQL with row-level locking), Cart Service (Redis with database persistence), Order Service (saga orchestrator, PostgreSQL), Payment Service (external provider integration), and Notification Service. Show Kafka connecting services for asynchronous event processing. Include API Gateway for authentication and rate limiting, CDN for product images and static pages. Draw separate read and write paths with the search index updated asynchronously via CDC.
Step 3: Deep Dive on Checkout Saga
Walk through each step. User clicks checkout. Order Service creates a pending record and starts the saga. Step 1: Inventory Service performs an atomic conditional stock decrement with a reservation TTL. Step 2: external shipping and tax APIs compute costs. Step 3: Payment Service authorizes the charge with an idempotency key. Step 4: Order Service finalizes the record, publishes "order confirmed" to Kafka. Step 5: payment capture happens asynchronously after a fraud-check window. If step 3 fails, step 1 is compensated by releasing the reservation. Saga state is persisted so a reconciliation worker handles timeouts.
Step 4: Address Secondary Concerns
Cover search: Elasticsearch index with products, sellers, and facets updated via Debezium CDC from PostgreSQL. Discuss caching layers: Redis for hot products and cart data, CDN for images and category pages. Address monitoring: search latency, checkout funnel conversion rates, payment provider response times, inventory reservation expirations, Kafka consumer lag. Discuss scaling: shard PostgreSQL orders by user ID, scale Elasticsearch horizontally based on query volume, auto-scale API servers based on request rate.
Related Learning
Deepen your understanding of the patterns used in this problem:
- Payment System -- multi-step payment workflows with idempotency, saga orchestration, and provider resilience
- Distributed Counters -- managing high-contention inventory counters without overselling
- Search -- full-text search with faceted filtering and relevance ranking for product discovery
- Caching -- cache-aside patterns for product pages and search results under heavy read traffic
- Message Queues -- Kafka for checkout event processing, order state propagation, and search index updates
- Databases -- transactional guarantees and partitioning for inventory, orders, and product data