Design a restaurant booking system that allows users to search for restaurants by name, cuisine, or location (defaulting to a 30-mile radius), view real-time table availability, and make, edit, or cancel reservations for individuals or groups. The platform serves both diners looking for a seamless reservation experience and restaurant operators managing limited table inventory during peak hours.
Consider the challenges of a Friday evening at 7 PM when hundreds of users simultaneously try to reserve tables at popular restaurants. Your system must prevent double-bookings, handle concurrent reservation attempts gracefully, and keep availability information fresh across web and mobile clients. Restaurant operators should be able to configure table layouts, time slots, and booking policies, while diners receive instant confirmation and timely reminders.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The most critical challenge is preventing two diners from reserving the same table at the same time, especially for popular restaurants during peak hours when dozens of requests target the same slots simultaneously.
Hints to consider:
Users expect fast, relevant results when searching by cuisine, restaurant name, or proximity. Running these queries directly against the transactional database would create unacceptable latency and contention.
Hints to consider:
Modifying a reservation is more complex than a simple update because changing the time or party size requires securing a new slot before releasing the old one. A naive in-place update risks losing both slots if something fails mid-operation.
Hints to consider:
Diners and restaurant staff both need to see availability changes reflected quickly. A table that was just booked should disappear from other users' views within seconds, not minutes.
Hints to consider:
Restaurants have varying table layouts, seating capacities, booking windows, and policies (minimum party size, cancellation deadlines, no-show penalties). Your data model must be flexible enough to support these without overcomplicating the booking logic.
Hints to consider:
Begin by confirming scope with your interviewer. Ask about expected scale: how many restaurants, how many concurrent diners during peak hours, and the geographic coverage (single city vs. multi-region). Clarify whether the system handles payments or just reservations. Confirm whether group reservations have different rules than individual ones, whether waitlists are in scope, and how restaurant operators manage their table inventory. Establish latency targets for search versus booking confirmation, and determine the acceptable staleness for availability data shown in search results.
Sketch the core services and data flow. A Search Service backed by Elasticsearch handles geo-proximity and text queries across restaurant listings. An Availability Service manages table inventory with a strongly consistent Postgres store and a Redis cache for fast reads. A Booking Service orchestrates the reservation lifecycle -- holds, confirmations, edits, and cancellations -- using saga-style coordination across availability, notifications, and optional payment services. A Restaurant Management Service allows operators to configure table layouts, time slots, and policies. An API Gateway sits in front for authentication, rate limiting, and routing. A message queue like Kafka propagates reservation events to the search index, analytics, and notification systems asynchronously.
Walk through the critical path when a diner confirms a reservation. The Booking Service receives the request with restaurant ID, date, time, and party size. It calls the Availability Service, which acquires a row-level lock on the matching table-group timeslot record in Postgres, checks that sufficient capacity remains, and decrements the available count while writing a hold record with a short TTL. The hold token is returned to the Booking Service, which proceeds with any confirmation steps (e.g., notification to the restaurant). On success, the hold is promoted to a confirmed reservation. On failure or timeout, the hold expires and the count is restored automatically by a cleanup job or Redis TTL callback. Discuss sharding the availability table by restaurant ID to distribute write load, and using optimistic locking with version columns for restaurants with lower contention.
Cover how search scales by partitioning Elasticsearch indexes geographically, caching popular queries at CDN and application layers, and syncing availability summaries from the booking path through CDC or event publishing. Discuss the edit flow as a hold-swap saga with clear rollback semantics. Address monitoring with metrics on booking funnel conversion, hold expiration rates, search latency percentiles, and availability cache hit ratios. Mention disaster recovery with Postgres replicas and Elasticsearch cluster redundancy. If time allows, discuss how waitlists, no-show tracking, and overbooking buffers extend the basic design.