Practice/Google/Design an Elevator Control System
Design an Elevator Control System
System DesignOptional
Problem Statement
You are asked to design the control system for a multi-elevator building. The system manages dispatching (deciding which elevator responds to a hall call), scheduling (determining the sequence of stops for each elevator), and real-time coordination so that elevators operate efficiently during both normal traffic and rush-hour surges.
The core algorithmic challenge is cost-based, direction-aware scheduling. When a passenger presses a hall call button on floor 7 going up, the system must evaluate every elevator's current position, direction, and queued stops to assign the one that minimizes total wait time across all passengers. Naive approaches like nearest-car dispatch break down during rush hour when elevators cluster and entire floors go unserved.
The system must also handle operational realities: elevators go out of service for maintenance, sensors can report conflicting states, and the dispatcher must recover gracefully from controller crashes without losing track of in-progress trips or duplicating hall call assignments.
Key Requirements
Functional
- Hall call dispatching -- Assign incoming hall calls (floor + direction) to the optimal elevator using a cost function that considers position, direction, current load, and queued stops.
- Car call scheduling -- Maintain an ordered stop list for each elevator that respects directional sweep (serve all up-requests before reversing) and dynamically inserts new stops.
- Real-time state tracking -- Continuously track each elevator's position, direction, door state, and passenger load through sensor events processed in a tight event loop.
- Hall call deduplication -- Multiple passengers pressing the same hall button on the same floor should generate a single dispatch assignment, not multiple elevator responses.
Non-Functional
- Scalability -- Support buildings with up to 50 elevators and 200 floors, handling hundreds of hall calls per minute during peak traffic.
- Latency -- Dispatch decision must complete within 100 milliseconds of a hall call to avoid perceptible delay at the button panel.
- Reliability -- The control system must survive a controller crash and resume operations within seconds, with no lost hall calls or orphaned elevator assignments.
- Fairness -- No floor should experience starvation; maximum wait time for any hall call must be bounded even under extreme load.
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Cost-Based Scheduling Algorithm
Interviewers want to see a concrete cost function, not just hand-waving about "picking the closest elevator." They probe how your function handles edge cases like full elevators, express zones, and directional conflicts.
Hints to consider:
- Think about a cost function that combines estimated time of arrival (based on current position, direction, and queued stops), current load factor, and energy cost of direction reversal
- Consider the SCAN algorithm (elevator algorithm) as a baseline and how you augment it with cost-based dispatch for multi-elevator coordination
- Explore how you handle the case where all elevators are moving away from a hall call — do you hold one in reserve or let the nearest one reverse?
- Discuss how you recalculate assignments when a new hall call arrives that could be more efficiently served by re-dispatching an already-assigned elevator
2. Rush Hour Contention
During morning rush, lobby-to-upper-floor traffic dominates and can cause elevator clustering at the lobby. Interviewers probe strategies for maintaining throughput and fairness under this skewed load pattern.
Hints to consider:
- Think about zoning strategies: assign subsets of elevators to floor ranges (low-rise, mid-rise, high-rise) during peak hours
- Consider lobby dispatch optimization: batch passengers by destination floor range and assign them to the zoned elevator
- Explore how you detect rush hour conditions dynamically (e.g., hall call rate from the lobby exceeds a threshold) and switch dispatch strategies
- Discuss the tradeoff between average wait time and worst-case wait time when optimizing for throughput
3. Failover and State Recovery
An elevator control system is safety-critical. Interviewers probe how you handle controller crashes, network partitions between the dispatcher and individual elevator controllers, and sensor failures.
Hints to consider:
- Think about persisting the dispatch state (active hall calls, elevator assignments, queued stops) to Redis or a similar fast store so a restarted controller can resume immediately
- Consider how each elevator's local controller operates autonomously if it loses contact with the central dispatcher — it continues serving its current stop list
- Explore how you handle split-brain scenarios where two dispatchers might be running simultaneously after a failover
- Discuss how ZooKeeper or a leader election mechanism ensures exactly one active dispatcher at any time