Practice/Uber/Design a Driver Heatmap System
Design a Driver Heatmap System
System DesignMust
Problem Statement
Design a real-time driver tracking system that processes location updates from drivers and generates heatmaps to visualize driver density across different geographical areas. Think of the live supply heatmaps in ride-sharing apps like Uber or Lyft that glow brighter where more drivers are available.
The system ingests frequent GPS updates from hundreds of thousands of active drivers, aggregates counts into geographic tiles at multiple zoom levels, and delivers live heatmap updates to many concurrent clients. The core challenges are handling high-velocity writes with geographic hotspots, windowed aggregation over streaming data, and efficient real-time delivery to map-viewing users.
Interviewers at Uber ask this to test whether you can design low-latency, high-write systems that aggregate streaming data, manage hotspots, and deliver efficient real-time updates. They want to see how you reason about tiling/geohash grids, windowed aggregation, backpressure, and the tradeoffs between accuracy, cost, and latency.
Key Requirements
Functional
- Live heatmap -- users view a live heatmap of driver density for a selected geographic region and zoom level
- Status filtering -- users filter the heatmap by driver status (available, on-trip) and recent time window (last 5-10 minutes)
- Driver location updates -- drivers send periodic location and status updates that are reflected on the heatmap in near real time
- Incremental updates -- users receive incremental updates as they pan/zoom the map so the view remains fresh without reloading
Non-Functional
- Scalability -- handle 500,000+ active drivers sending location updates every 5-10 seconds, producing 50K-100K events per second
- Reliability -- tolerate processing node failures without data loss; heatmap may briefly show stale data during recovery
- Latency -- heatmap reflects driver supply changes within 10 seconds; tile loads under 200 ms
- Consistency -- eventual consistency acceptable; counts should be approximately correct within a 10-second window
What Interviewers Focus On
Based on real interview experiences at Uber, these are the areas interviewers probe most deeply:
1. Geospatial Tiling and Aggregation Strategy
You need a defined grid system for aggregating driver counts. Interviewers want to see how you map GPS coordinates to tiles and handle multiple zoom levels.
Hints to consider:
- Use geohash or quadkey-based tiles aligned with standard map tile coordinates (e.g., geohash precision 5 for city-level, 6 for neighborhood)
- Define an explicit time window (e.g., 5 minutes) -- a driver's GPS ping is counted in the tile only if it arrived within this window
- Pre-aggregate at multiple zoom levels: fine-grained tiles for zoomed-in views, coarser tiles for zoomed-out views
- Use TTL-based expiration so stale driver positions automatically drop off the heatmap
2. High-Throughput Write Path
Driver GPS updates are a high-velocity write stream with geographic hotspots (airports, downtown areas). Interviewers want to see how you ingest and aggregate without bottlenecks.
Hints to consider:
- Ingest GPS updates into Kafka partitioned by geohash prefix to spread load and co-locate events for the same region
- Use Flink with event-time windowed aggregation (tumbling 10-second windows) to compute per-tile driver counts
- Batch counter updates to Redis rather than writing on every single GPS event
- For hot tiles (airports, stadiums), use sharded counters to avoid contention on a single Redis key
3. Handling Out-of-Order, Duplicate, and Stale Updates
Mobile networks produce delayed and duplicated events. Without event-time handling, counts become inaccurate.
Hints to consider:
- Use event-time processing in Flink with watermarks to handle late-arriving GPS events
- Deduplicate by (driver_id, timestamp) within the processing window
- Apply a staleness threshold: if a driver's last update is older than the window, exclude them from the count
- Accept that counts are approximate; do not try for exact real-time accuracy
4. Real-Time Update Delivery to Clients
Users viewing the heatmap need tile-level updates pushed efficiently. Broadcasting every GPS event to every client would overwhelm the system.
Hints to consider:
- Clients subscribe to tile IDs for their current viewport; only push updates for tiles that changed
- Use a pub/sub layer (Redis Pub/Sub or NATS) keyed by tile ID to route updates to the correct WebSocket servers
- Coalesce rapid tile updates into periodic pushes (every 5-10 seconds) to reduce message volume
- On initial load, serve the current tile state from Redis; subsequent updates are pushed incrementally