Practice/Uber/Design a Driver Heatmap System

Design a Driver Heatmap System

System DesignMust

Problem Statement

Design a real-time driver tracking system that processes location updates from drivers and generates heatmaps to visualize driver density across different geographical areas. Think of the live supply heatmaps in ride-sharing apps like Uber or Lyft that glow brighter where more drivers are available.

The system ingests frequent GPS updates from hundreds of thousands of active drivers, aggregates counts into geographic tiles at multiple zoom levels, and delivers live heatmap updates to many concurrent clients. The core challenges are handling high-velocity writes with geographic hotspots, windowed aggregation over streaming data, and efficient real-time delivery to map-viewing users.

Interviewers at Uber ask this to test whether you can design low-latency, high-write systems that aggregate streaming data, manage hotspots, and deliver efficient real-time updates. They want to see how you reason about tiling/geohash grids, windowed aggregation, backpressure, and the tradeoffs between accuracy, cost, and latency.

Key Requirements

Functional

Live heatmap -- users view a live heatmap of driver density for a selected geographic region and zoom level
Status filtering -- users filter the heatmap by driver status (available, on-trip) and recent time window (last 5-10 minutes)
Driver location updates -- drivers send periodic location and status updates that are reflected on the heatmap in near real time
Incremental updates -- users receive incremental updates as they pan/zoom the map so the view remains fresh without reloading

Non-Functional

Scalability -- handle 500,000+ active drivers sending location updates every 5-10 seconds, producing 50K-100K events per second
Reliability -- tolerate processing node failures without data loss; heatmap may briefly show stale data during recovery
Latency -- heatmap reflects driver supply changes within 10 seconds; tile loads under 200 ms
Consistency -- eventual consistency acceptable; counts should be approximately correct within a 10-second window

What Interviewers Focus On

Based on real interview experiences at Uber, these are the areas interviewers probe most deeply:

1. Geospatial Tiling and Aggregation Strategy

You need a defined grid system for aggregating driver counts. Interviewers want to see how you map GPS coordinates to tiles and handle multiple zoom levels.

Hints to consider:

Use geohash or quadkey-based tiles aligned with standard map tile coordinates (e.g., geohash precision 5 for city-level, 6 for neighborhood)
Define an explicit time window (e.g., 5 minutes) -- a driver's GPS ping is counted in the tile only if it arrived within this window
Pre-aggregate at multiple zoom levels: fine-grained tiles for zoomed-in views, coarser tiles for zoomed-out views
Use TTL-based expiration so stale driver positions automatically drop off the heatmap

2. High-Throughput Write Path

Driver GPS updates are a high-velocity write stream with geographic hotspots (airports, downtown areas). Interviewers want to see how you ingest and aggregate without bottlenecks.

Hints to consider:

Ingest GPS updates into Kafka partitioned by geohash prefix to spread load and co-locate events for the same region
Use Flink with event-time windowed aggregation (tumbling 10-second windows) to compute per-tile driver counts
Batch counter updates to Redis rather than writing on every single GPS event
For hot tiles (airports, stadiums), use sharded counters to avoid contention on a single Redis key

3. Handling Out-of-Order, Duplicate, and Stale Updates

Mobile networks produce delayed and duplicated events. Without event-time handling, counts become inaccurate.

Hints to consider:

Use event-time processing in Flink with watermarks to handle late-arriving GPS events
Deduplicate by (driver_id, timestamp) within the processing window
Apply a staleness threshold: if a driver's last update is older than the window, exclude them from the count
Accept that counts are approximate; do not try for exact real-time accuracy

4. Real-Time Update Delivery to Clients

Users viewing the heatmap need tile-level updates pushed efficiently. Broadcasting every GPS event to every client would overwhelm the system.

Hints to consider:

Clients subscribe to tile IDs for their current viewport; only push updates for tiles that changed
Use a pub/sub layer (Redis Pub/Sub or NATS) keyed by tile ID to route updates to the correct WebSocket servers
Coalesce rapid tile updates into periodic pushes (every 5-10 seconds) to reduce message volume
On initial load, serve the current tile state from Redis; subsequent updates are pushed incrementally