Problem Statement
Design an eBay-like e-commerce marketplace where buyers browse a large product catalog, compare offers from multiple sellers, add items to a shopping cart, complete checkout with shipping and payment, and track orders through fulfillment. The platform must handle high read traffic on catalog pages and search, bursts of write traffic during promotional events, and strict correctness for inventory counts and payment processing.
The key engineering challenges are scaling product discovery (search, filtering, pagination) to millions of concurrent browsing sessions, preventing overselling when many buyers race for limited stock, coordinating multi-step checkout workflows across inventory, payment, and order services with proper idempotency and compensation, and keeping product detail pages fast under heavy load. You need to frame clear domain entities -- User, Product, SKU, Cart, Order, Payment, Shipment -- and show how APIs map to them with principled trade-offs.
Key Requirements
Functional
- Product discovery -- users can search, filter by attributes (category, price, ratings), and browse paginated product listings with fast results
- Shopping cart -- users can add, update, and remove items in a persistent cart with accurate real-time pricing and availability indicators
- Checkout and payment -- users can enter shipping details, see tax calculations, authorize payment, and receive order confirmation through a reliable multi-step flow
- Order tracking -- users can view order status, shipment tracking, and order history, with support for cancellations and returns within policy windows
Non-Functional
- Scalability -- support tens of millions of concurrent browsing sessions, 100,000+ searches per second, and 10,000+ checkout attempts per second during promotions
- Reliability -- zero overselling even under race conditions; 99.99% availability for catalog reads and 99.9% for transactional operations
- Latency -- product detail pages load within 150ms; search results within 300ms; checkout completion within 2 seconds
- Consistency -- strong consistency for inventory reservations and payment processing; eventual consistency for catalog search index and product recommendations
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Inventory Contention and Oversell Prevention
When hundreds of buyers attempt to purchase the last few units of a popular item simultaneously, the system must guarantee that no more units are sold than available. This is the defining technical challenge of e-commerce design.
Hints to consider:
- Separate inventory reservation from payment completion using time-limited holds (e.g., 10 minutes) that automatically expire and return stock to the available pool
- Use optimistic concurrency control with version columns on the inventory row, or atomic conditional decrements (UPDATE ... WHERE quantity >= requested), to prevent race conditions
- Consider pre-allocating inventory into sharded buckets during high-traffic events to reduce contention on a single row
- Monitor reservation-to-purchase conversion rates and adjust hold durations to balance fairness and throughput
2. Multi-Step Checkout Orchestration
Checkout involves coordinating inventory confirmation, tax calculation, shipping rate lookup, payment authorization, and order creation across multiple services and external providers. Any step can fail independently.
Hints to consider:
- Model checkout as a saga with explicit state transitions: Cart Validated, Inventory Reserved, Payment Authorized, Order Created, Payment Captured
- Each step must be idempotent so retries are safe; use idempotency keys for payment provider calls
- Define compensating actions for each step: release inventory if payment fails, void authorization if order creation fails
- Persist the saga state in a durable store so crashed or timed-out workflows can be resumed or rolled back by a reconciliation worker
3. Read Scaling for Catalog and Search
Catalog browsing and product search dominate traffic and must remain fast even during promotional spikes. Using the transactional database for these queries will cause latency spikes that also impact critical writes.
Hints to consider:
- Maintain a dedicated search index in Elasticsearch with product attributes, facets, and relevance scoring, updated via CDC from the product database
- Cache product detail pages and popular search results in Redis with short TTLs and background refresh to absorb traffic spikes
- Use a read-optimized product catalog service separate from the inventory and order services so read traffic scaling does not affect transactional paths
- Consider pre-rendering popular category pages and storing them at the CDN edge for sub-100ms delivery
4. Cart Persistence and Pricing Accuracy
The shopping cart must survive across sessions and devices while reflecting current prices and availability. Stale cart data leads to checkout failures and poor user experience.
Hints to consider:
- Store cart state server-side (in Redis for active carts, with periodic persistence to the database) keyed by user ID
- Re-validate prices and availability at checkout initiation rather than relying on the values captured at add-to-cart time
- Show availability warnings on the cart page when stock is low, but defer hard reservation until checkout to avoid holding inventory unnecessarily
- Support cart merging when anonymous carts are linked to authenticated users after login
5. Order Lifecycle and Post-Purchase Operations
After checkout, the order moves through fulfillment states and users expect visibility into status changes. Cancellations, returns, and refunds add additional transactional complexity.
Hints to consider:
- Model the order as a state machine with transitions: Pending, Processing, Shipped, Delivered, Cancelled, Returned
- Publish order state change events to Kafka for downstream consumers: notification service, analytics, inventory adjustment, and seller dashboards
- Cancellation triggers compensating actions: refund initiation, inventory restock, shipment cancellation if not yet dispatched
- Idempotent event processing ensures that duplicate state change events do not corrupt inventory or trigger double refunds
Suggested Approach
Step 1: Clarify Requirements
Ask about the scale: how many products in the catalog, expected concurrent users, and peak checkout rate. Clarify whether the platform supports multiple sellers per product (marketplace model) or is single-seller. Determine if flash sales or time-limited promotions are in scope, as these create extreme inventory contention. Ask about geographic distribution and whether multi-region deployment is needed. Confirm the depth of payment integration -- just authorization/capture, or full refund and chargeback handling.
Step 2: High-Level Architecture
Sketch the core services: Product Catalog Service (read-optimized, backed by Elasticsearch and Redis), Inventory Service (write-optimized, backed by PostgreSQL with row-level locking), Cart Service (Redis for active carts with database persistence), Order Service (saga orchestrator backed by PostgreSQL), Payment Service (integrates with external payment providers), and Notification Service. Show Kafka connecting them for event-driven communication. Include an API Gateway for authentication, rate limiting, and request routing. Place a CDN in front for product images and static assets. Show separate read and write paths, with the search index updated asynchronously.
Step 3: Deep Dive on Checkout Flow
Walk through the saga step by step. The user clicks checkout; the Order Service creates a pending order record and begins the saga. Step 1: call Inventory Service to reserve units using an atomic conditional update on the stock row. Step 2: calculate shipping and tax via external APIs. Step 3: call Payment Service with an idempotency key to authorize the charge. Step 4: on authorization success, finalize the order record and publish an "order confirmed" event to Kafka. Step 5: asynchronously capture the payment after a short delay to allow for fraud checks. If any step fails, execute compensating actions in reverse order. The saga state is persisted in PostgreSQL so a background reconciliation worker can resume or cancel stuck workflows.
Step 4: Address Secondary Concerns
Cover search by describing the Elasticsearch index with product attributes, seller ratings, and availability flags, updated via CDC from the product database. Discuss caching: Redis for product detail pages, cart data, and popular search results; CDN for images and category pages. Address monitoring: track search latency, checkout success/failure rates, inventory reservation conversion rates, payment provider response times, and Kafka consumer lag. Discuss scaling: shard PostgreSQL for orders by user ID, partition Kafka by product category for parallel processing, and auto-scale Elasticsearch nodes based on query volume.
Related Learning
Deepen your understanding of the patterns used in this problem:
- Payment System -- designing multi-step payment workflows with idempotency, saga orchestration, and provider resilience
- Distributed Counters -- techniques for managing high-contention inventory counters without overselling
- Caching -- cache-aside patterns for product detail pages and search results under heavy read load
- Message Queues -- Kafka for decoupling checkout steps, order events, and search index updates
- Databases -- transactional guarantees and sharding strategies for inventory, orders, and the product catalog
- Search -- full-text search with faceted filtering and relevance ranking for product discovery