Practice/Bloomberg/Design a Train Reservation System
Design a Train Reservation System
System DesignMust
Problem Statement
Design a seat reservation platform for a national high-speed rail network that serves millions of passengers daily across hundreds of routes. Travelers need to search train schedules between any two stations, view real-time seat availability and pricing, and complete secure bookings within seconds. The system must handle extreme demand spikes when new routes launch or holiday ticket sales open, while guaranteeing zero double-bookings even when thousands of users compete for the last remaining seats on a popular departure.
The core modeling challenge lies in segment-based inventory: a seat occupied from Station A to Station C is unavailable for the A-to-B leg, but becomes available again after Station C for the C-to-D portion of the journey. Your design must maintain strict transactional consistency during concurrent bookings, implement a reliable multi-phase reservation workflow (search, hold, payment, ticketing), and gracefully degrade under peak load without compromising data integrity. Bloomberg interviews feature this problem because it tests precise concurrency control, multi-step workflow orchestration, and capacity management under contention -- skills directly applicable to financial transaction systems.
Key Requirements
Functional
- Schedule search -- users query train schedules between origin and destination stations for specific dates, viewing departure times, journey durations, intermediate stops, and connection options
- Seat availability and pricing -- display available seats by class (economy, business, first class) with dynamic pricing based on demand levels, advance purchase timing, and route popularity
- Reservation workflow -- hold selected seats temporarily during the checkout process, process payment securely through external gateways, issue confirmed tickets with unique booking references, and send confirmations via email and SMS
- Booking management -- users can view upcoming trips, cancel reservations with refund rules applied automatically, modify travel dates subject to availability, and download printable or mobile tickets
Non-Functional
- Scalability -- support 100,000 concurrent users during flash sales, sustain 500 seat bookings per second, and scale horizontally as the route network expands
- Reliability -- achieve 99.9 percent uptime for booking services, implement automatic failover for payment processing, and ensure zero data loss for confirmed reservations
- Latency -- return search results within 300ms, complete seat holds within 500ms, and provide sub-second booking confirmation after successful payment
- Consistency -- guarantee strong consistency for seat allocation (absolutely no overbooking); eventual consistency is acceptable for search indexes and availability display caches
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Segment-Based Inventory Management
The most common mistake candidates make is treating each train as having a single seat counter. A 400-seat train running from City A through B, C, and D actually has independent capacity for each segment (A-B, B-C, C-D). Seat 12A might be sold for the A-to-C portion, then resold for C-to-D, maximizing revenue and utilization.
Hints to consider:
- Model each train journey as a series of segments with independent availability counters; when a passenger books A-to-C, atomically decrement both the A-B and B-C segment counters within a single database transaction
- Use range-locking or exclusion constraints in PostgreSQL to prevent conflicting reservations on overlapping segments during concurrent transactions
- Pre-compute availability snapshots for common origin-destination pairs (e.g., top 50 routes) to speed up search queries while maintaining accuracy through cache invalidation on booking events
- Discuss the trade-off between normalizing segments (flexible but complex queries) versus denormalizing seat-segment mappings (faster lookups but higher storage and update cost)
2. Concurrency Control Under Contention
When a popular train has only 5 seats left and 50 users click "book" simultaneously, the system faces extreme write contention. Naive SELECT-then-UPDATE approaches create race conditions, while coarse-grained table locks destroy throughput.
Hints to consider:
- Use PostgreSQL's SELECT FOR UPDATE SKIP LOCKED to let concurrent transactions grab different available seats without blocking each other
- Implement optimistic locking with version numbers on segment availability rows to detect conflicts and retry only the failed transactions
- Shard inventory by coach or car number so multiple booking requests targeting different coaches can proceed in parallel without contention
- Maintain a fast-path availability counter in Redis with atomic DECR operations for preliminary availability checks, with periodic reconciliation against the authoritative database state
3. Multi-Phase Booking Workflow
Booking is not atomic from the user's perspective: they need time to review selections, enter passenger details, and wait for the payment gateway response. Meanwhile, held seats must be protected from other buyers but released if the session is abandoned or payment fails.
Hints to consider:
- Implement a two-phase workflow: first create a temporary hold with a TTL (5 to 10 minutes), marking seats as HELD in the database; then convert to CONFIRMED status only after the payment gateway returns success
- Use idempotency keys (client-generated UUIDs) to safely retry payment authorization without duplicate charges if network calls time out
- Design compensating transactions (saga pattern) to roll back seat holds, refund payments, and clean up state when any step in the workflow fails
- Store workflow state in a durable event log (Kafka outbox pattern) so background workers can detect and resolve abandoned holds or stuck payment authorizations
4. Peak Load Handling and Graceful Degradation
Flash sales for holiday travel or new route launches create traffic spikes 100 times normal load within minutes. The system must absorb these bursts without crashing while providing clear feedback to users.
Hints to consider:
- Place rate limiters and admission control at the API gateway to shed excess load and prevent thundering herd from overwhelming the database
- Implement virtual waiting rooms that queue excess users and drip-feed them to the booking service at a sustainable rate
- Cache search results and availability snapshots aggressively (even with 10 to 30 second staleness), updating asynchronously as bookings complete
- Use circuit breakers around the payment gateway and fall back to "pending confirmation" mode if the external service becomes temporarily unavailable
Suggested Approach
Step 1: Clarify Requirements
Start by confirming scope and priorities. Ask about scale: how many trains operate daily, how many stations are in the network, and what is the average journey length in segments? Clarify whether the system handles only direct journeys or also multi-leg trips with connections. Verify consistency requirements -- can search results show slightly stale availability, or must every displayed seat be guaranteed bookable? Confirm whether dynamic pricing, waitlists, or group bookings are in scope. Understand peak load patterns: is it gradual daily growth or sudden flash-sale spikes?