Practice/Amazon/Design a Temperature Monitoring System
Design a Temperature Monitoring System
System DesignMust
Problem Statement
Design a temperature monitoring system that displays current and historical temperature data across multiple locations in real time. Sensors deployed at warehouses, data centers, cold storage facilities, or outdoor stations report temperature readings periodically. Operators need live dashboards showing current temperatures, historical trend charts, and immediate alerts when temperatures fall outside safe ranges.
At Amazon, this question maps to scenarios like warehouse climate control, data center cooling monitoring, or cold chain logistics. Interviewers use it to evaluate your ability to design IoT data ingestion pipelines, time-series storage, real-time alerting, and efficient dashboard serving. The challenge is handling high-frequency sensor data at scale while maintaining low-latency dashboards and reliable alert delivery.
Key Requirements
Functional
- Sensor data collection -- accept temperature readings from thousands of sensors across hundreds of locations, reporting every 10-30 seconds
- Real-time dashboard -- operators view current temperatures per location and sensor on an interactive dashboard with auto-refresh
- Historical charts -- operators view temperature trends over configurable time windows (1 hour, 24 hours, 7 days, 30 days) with zoom and drill-down
- Threshold alerts -- operators configure temperature ranges per location; the system notifies within seconds when readings breach thresholds
Non-Functional
- Scalability -- handle 100,000+ sensors with reporting intervals of 10-30 seconds, producing millions of readings per minute
- Reliability -- no data loss during network interruptions between sensors and the backend; alert pipeline maintains 99.9% uptime
- Latency -- dashboard updates within 5 seconds of sensor reading; alerts trigger within 10 seconds of threshold breach
- Consistency -- eventual consistency for dashboards; alerts must not miss genuine threshold breaches even if they occasionally fire on transient spikes
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Ingestion Pipeline for High-Frequency Sensor Data
Thousands of sensors generating readings every few seconds creates substantial write volume. Interviewers want to see a robust ingestion path.
Hints to consider:
- Use MQTT or lightweight HTTP endpoints at the edge for sensor communication, bridging to Kafka for durability
- Partition Kafka topics by location_id or sensor_id for ordered processing and parallel consumption
- Handle sensor retries and duplicate readings with idempotency based on (sensor_id, timestamp) composite keys
- Implement local buffering on sensor gateways to handle intermittent connectivity without data loss
2. Time-Series Storage and Efficient Querying
Temperature data accumulates rapidly and must be queryable across various time ranges with different granularities. Interviewers probe your retention and aggregation strategy.
Hints to consider:
- Store raw readings in a time-series database with (location_id, sensor_id, timestamp) as the key structure
- Pre-compute aggregations: keep raw 10-second data for 7 days, 1-minute averages for 30 days, hourly averages for 1 year
- Use time-based partitioning to enable efficient range queries and automated data lifecycle management (drop old partitions)
- Apply compression techniques optimized for time-series (delta-of-delta, gorilla encoding) to reduce storage costs
3. Real-Time Alerting with Debouncing
Temperature alerts are safety-critical but must avoid false alarms from transient sensor spikes. Interviewers look for a reliable yet pragmatic alerting design.
Hints to consider:
- Process the live Kafka stream with a stateful consumer that maintains per-sensor sliding windows
- Require sustained threshold breaches (e.g., 3 consecutive readings above limit) before triggering an alert to suppress noise
- Fan out alerts to multiple channels (dashboard banner, SMS, email) through a notification service with delivery confirmation
- Implement alert acknowledgment and escalation: if unacknowledged within 5 minutes, re-alert to a secondary contact
4. Dashboard Architecture and Live Updates
Operators need dashboards that feel live. Interviewers want to see how you serve current and historical data efficiently.
Hints to consider:
- Maintain a current-state store (Redis) with the latest reading per sensor, updated from the Kafka stream
- Serve the live dashboard from Redis for instant loads, with WebSocket push for incremental updates
- For historical charts, query the time-series database with automatic resolution selection based on the requested time range
- Cache popular dashboard configurations (e.g., "all sensors at location X for last 24 hours") in Redis with short TTLs