Practice/Meta/Design Yelp
Design Yelp
System DesignMust
Problem Statement
Design a food delivery platform similar to DoorDash or Uber Eats that connects customers with restaurants and delivery drivers. The system must handle the entire lifecycle: customers browse menus, place orders, restaurants receive and prepare them, and drivers pick up and deliver food in real-time. The platform should support millions of active users across hundreds of cities, with peak traffic during lunch and dinner hours reaching 100,000+ concurrent orders.
The core challenge lies in coordinating three distinct user types with conflicting needs: customers want fast delivery and accurate ETAs, restaurants need time to prepare quality food, and drivers want efficient routing to maximize earnings. Your design must balance real-time location tracking, dynamic pricing, reliable order state machines, and fair driver assignment while maintaining sub-second search performance and handling payment processing at scale.
Key Requirements
Functional
- Restaurant Discovery -- customers search and filter restaurants by cuisine, distance, rating, delivery time, and price range within their delivery area
- Order Placement -- customers add items to cart, apply promotions, choose delivery address, pay securely, and receive order confirmation with estimated delivery time
- Order Fulfillment -- restaurants receive orders, confirm availability, update preparation status, and notify when ready for pickup
- Driver Assignment -- system matches available drivers to ready orders based on proximity, capacity, and route optimization
- Real-Time Tracking -- customers and restaurants see live driver location and updated ETAs as delivery progresses
- Order History -- all parties access past orders with receipts, ratings, and issue resolution options
Non-Functional
- Scalability -- support 100k concurrent orders during peak hours across 500+ cities with graceful degradation under load spikes
- Reliability -- maintain 99.95% uptime for order placement and payment processing with no lost or duplicate orders even during failures
- Latency -- restaurant search returns results in under 300ms at p95, real-time tracking updates every 5-10 seconds with minimal lag
- Consistency -- order state transitions follow strict sequencing (placed → confirmed → preparing → ready → picked up → delivered) with exactly-once semantics for payments
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Driver-Order Matching and Assignment
How you assign the right driver to each order is the most critical algorithmic challenge. Naive approaches like "nearest available driver" create inefficiency and poor earnings distribution. Interviewers want to see you reason about batching multiple orders, predicting driver availability windows, and balancing fairness with speed.
Hints to consider:
- Use geohashing or quadtrees to find candidate drivers within a reasonable radius (e.g., 3 miles) rather than scanning all active drivers
- Consider a scoring function that weighs distance, estimated pickup time, driver acceptance rate, and route compatibility for stacked orders
- Implement a fallback broadcast mechanism where orders are offered to expanding rings of drivers if initial assignments are rejected
- Account for driver preferences (e.g., avoiding highways) and vehicle type constraints (bike vs car for large orders)
2. Real-Time Location Tracking at Scale
Handling millions of GPS coordinate updates per minute from active drivers while serving low-latency queries to customers is non-trivial. Interviewers expect you to discuss write amplification, staleness tolerance, and efficient spatial indexing.
Hints to consider:
- Batch location updates from mobile clients (every 5-10 seconds) and use UDP or WebSockets to reduce connection overhead
- Store recent driver positions in Redis with geospatial indexes (GEOADD/GEORADIUS) for fast proximity queries with TTL-based cleanup
- Use Kafka to stream location events for analytics and fraud detection without blocking the hot path
- Implement client-side smoothing and dead reckoning to handle network interruptions gracefully and avoid jittery map pins
3. Order State Management and Consistency
Orders transition through multiple states across three different actors, and any inconsistency leads to bad experiences (double-charging, lost orders, stuck deliveries). Interviewers probe how you enforce correct sequencing and handle failures.
Hints to consider:
- Model orders as finite state machines with explicit allowed transitions stored in PostgreSQL with optimistic locking or version numbers
- Use Kafka or event sourcing to create an immutable audit trail of state changes with idempotency keys to prevent duplicate processing
- Implement compensation logic (saga pattern) for rollbacks when payment succeeds but restaurant confirmation fails
- Separate payment authorization from capture -- authorize on order placement, capture only when restaurant confirms to avoid holding funds for cancelled orders
4. Restaurant Search and Menu Data
Customers expect instant search results filtered by dozens of attributes, but menu data changes frequently (items go out of stock, prices update, hours change). Balancing freshness with query performance is key.
Hints to consider:
- Use Elasticsearch with per-city sharding and geo_point queries to find restaurants within delivery radius with sub-300ms latency
- Denormalize popular items and aggregate ratings into the search document to avoid joins, but keep full menus in PostgreSQL as source of truth
- Implement cache-aside pattern with Redis for hot restaurant detail pages (TTL 5-15 minutes) with cache invalidation on menu updates via CDC
- Handle dynamic availability by maintaining an in-memory bit vector of open/closed restaurants updated every minute rather than querying each result