Practice/Uber/Design a Temperature Monitoring System
Design a Temperature Monitoring System
System DesignMust
Problem Statement
Design a temperature monitoring system that displays current and historical temperature data across a 10 million km-squared area with sensors placed every 10 km-squared. Users should be able to view real-time temperatures on a map and query past temperature records for any location in the coverage area.
This implies around one million sensor endpoints continuously streaming measurements. The system must handle sustained high write throughput, low-latency reads for current conditions, real-time map updates, and efficient historical range queries. The core challenges revolve around time-series ingestion at scale, geospatial indexing, and delivering a responsive UI via caching and fan-out while keeping storage under control.
Interviewers at Uber ask this to test whether you can design a geo-aware, time-series system that handles high write throughput, low-latency reads, and real-time updates. Strong answers show clear data modeling for time-range queries, backpressure management, and a practical caching and fan-out strategy.
Key Requirements
Functional
- Real-time temperature view -- users view the current temperature for any location in the coverage area in near real time
- Historical queries -- users query historical temperatures for a specific location over a chosen time range
- Sensor health -- users see when a sensor was last updated and whether its data is currently healthy or delayed
- Regional aggregation -- users view aggregated temperatures over a region (map tiles or bounding boxes) for a time window
Non-Functional
- Scalability -- handle approximately 1 million sensors reporting at intervals of 1 minute or less, producing 16K+ writes per second sustained
- Reliability -- tolerate sensor and node failures without data loss; graceful degradation if prediction services fail
- Latency -- current temperature reads under 100 ms P99; historical range queries under 500 ms for typical time ranges
- Consistency -- eventual consistency acceptable for map displays; sensor health status should reflect actual state within seconds
What Interviewers Focus On
Based on real interview experiences at Uber, Amazon, and Oracle, these are the areas interviewers probe most deeply:
1. Ingestion Pipeline and Write Scaling
With roughly one million sensors, even moderate sampling rates create intense write loads. Interviewers want to see how you handle sustained ingestion without creating hot shards.
Hints to consider:
- Ingest sensor readings into Kafka partitioned by sensor_id to spread load and maintain per-sensor ordering
- Use idempotent consumers with event timestamps to handle duplicates and out-of-order arrivals from unreliable sensor networks
- Batch writes to the time-series store (flush every 5 seconds per partition) to reduce write amplification
- Apply backpressure at the ingestion layer to shed load gracefully during traffic spikes
2. Time-Series Storage and Query Patterns
Historical queries require efficient range scans over potentially years of data. Interviewers evaluate your storage model and retention strategy.
Hints to consider:
- Use a time-series database (Cassandra with time-bucketed partitions, or a purpose-built TSDB like TimescaleDB) with composite keys of (sensor_id, time_bucket)
- Implement time-bucketed partitioning (hourly or daily) to enable efficient range scans and partition-level deletion for retention
- Pre-compute rollups (hourly and daily averages) asynchronously to speed up longer-range queries
- Apply TTL-based retention policies: raw data for 30 days, hourly rollups for 1 year, daily rollups indefinitely
3. Real-Time Map Updates
Users viewing the temperature map need fresh data without polling every second. Interviewers probe how you push updates efficiently to many concurrent viewers.
Hints to consider:
- Store the latest reading per sensor in Redis with a short TTL for sub-millisecond current-value lookups
- Use a tile-based approach: aggregate sensors into map tiles and push tile-level updates to subscribed clients via WebSockets or SSE
- Coalesce rapid updates within the same tile into periodic refreshes (every 10-30 seconds) to reduce fan-out volume
- Use Redis Pub/Sub or a similar pub/sub layer to route tile updates to the correct WebSocket servers
4. Geospatial Indexing and Regional Aggregation
Users want to see temperature patterns across regions, not just individual sensors. Interviewers look for spatial query support.
Hints to consider:
- Index sensors by geohash or S2 cell ID to enable efficient spatial lookups (find all sensors in a bounding box)
- Pre-aggregate tile-level statistics (min, max, average temperature) during stream processing for common zoom levels
- Cache tile aggregates in Redis with TTLs matching the update frequency
- Support drill-down: show tile-level summary at low zoom, individual sensor data at high zoom