Practice/Oracle/Design a Temperature Monitoring System
Design a Temperature Monitoring System
System DesignMust
Problem Statement
Design a temperature monitoring system that displays current and historical temperature data across a 10 million km-squared area with sensors placed every 10 km-squared. Users should be able to view real-time temperatures and query past temperature records for any location in the coverage area.
This implies roughly one million sensor endpoints continuously streaming measurements. The system is a real-time and historical data platform where users view live temperatures on a map and query past readings for any point in a large region. Interviewers ask this to test whether you can design a geo-aware, time-series system that handles sustained high write throughput, low-latency reads, and real-time updates while keeping storage under control.
Key Requirements
Functional
- Current temperature -- users can view the current temperature for any location in the coverage area in near real time
- Historical queries -- users can query historical temperatures for a specific location over a chosen time range
- Sensor health -- users can see when a sensor was last updated and whether its data is currently healthy or delayed
- Regional aggregation -- users can view aggregated temperatures over a region (map tiles or bounding boxes) for a time window
Non-Functional
- Scalability -- support one million sensors, each reporting every minute (~16,700 writes/second sustained), with higher sampling rates possible
- Reliability -- tolerate individual sensor or processing node failures without data loss; backfill late data
- Latency -- serve current temperature lookups in under 100ms; historical range queries in under 500ms for reasonable time ranges
- Consistency -- eventual consistency for map updates (seconds of delay acceptable); strong consistency for historical query results
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. High-Throughput Ingestion Pipeline
With roughly one million sensors, even at one reading per minute, you face ~16,700 writes per second. Interviewers want to see a durable, scalable write path.
Hints to consider:
- Use Kafka as the ingestion layer, partitioned by sensor_id or geographic region for balanced load distribution
- Implement idempotent consumers to handle duplicate sensor readings from retries or overlapping data feeds
- Buffer and batch writes to the time-series store to reduce write amplification
- Apply backpressure mechanisms to handle bursts when many sensors report simultaneously
2. Time-Series Data Modeling and Storage
Temperature readings are classic time-series data. Interviewers expect you to choose appropriate storage strategies and data models.
Hints to consider:
- Use a time-series database (Cassandra, TimescaleDB, or InfluxDB) with partition keys combining sensor_id and time buckets
- Design partition keys to spread writes evenly: sensor_id as partition key, timestamp as clustering column
- Implement data retention policies: keep raw data for 30 days, then downsample to hourly/daily averages for long-term storage
- Use TTL-based automatic expiration for raw data to manage storage growth
3. Real-Time Map Display and Updates
Interviewers want to see how you serve live temperature data to users viewing a map without overloading storage.
Hints to consider:
- Cache the latest reading per sensor in Redis for sub-millisecond lookups
- Pre-compute temperature aggregates per map tile at multiple zoom levels
- Use WebSockets or Server-Sent Events to push temperature updates to clients viewing specific map regions
- Subscribe clients only to tiles in their current viewport to minimize unnecessary data transfer
4. Data Quality and Late-Arriving Data
Real sensors produce noisy, late, and sometimes incorrect data. Interviewers expect strategies for handling these issues.
Hints to consider:
- Validate readings within expected ranges and flag or discard obvious outliers
- Handle late-arriving data by accepting readings with a lateness window and updating affected aggregations
- Mark sensors as "stale" when readings stop arriving and display last-known-good data with a warning indicator
- Implement redundant sensors in critical areas to cross-validate readings