Practice/Amazon/Design a Temperature Monitoring System

Design a Temperature Monitoring System

System DesignMust

Problem Statement

Design a temperature monitoring system that displays current and historical temperature data across multiple locations in real time. Sensors deployed at warehouses, data centers, cold storage facilities, or outdoor stations report temperature readings periodically. Operators need live dashboards showing current temperatures, historical trend charts, and immediate alerts when temperatures fall outside safe ranges.

At Amazon, this question maps to scenarios like warehouse climate control, data center cooling monitoring, or cold chain logistics. Interviewers use it to evaluate your ability to design IoT data ingestion pipelines, time-series storage, real-time alerting, and efficient dashboard serving. The challenge is handling high-frequency sensor data at scale while maintaining low-latency dashboards and reliable alert delivery.

Key Requirements

Functional

Sensor data collection -- accept temperature readings from thousands of sensors across hundreds of locations, reporting every 10-30 seconds
Real-time dashboard -- operators view current temperatures per location and sensor on an interactive dashboard with auto-refresh
Historical charts -- operators view temperature trends over configurable time windows (1 hour, 24 hours, 7 days, 30 days) with zoom and drill-down
Threshold alerts -- operators configure temperature ranges per location; the system notifies within seconds when readings breach thresholds

Non-Functional

Scalability -- handle 100,000+ sensors with reporting intervals of 10-30 seconds, producing millions of readings per minute
Reliability -- no data loss during network interruptions between sensors and the backend; alert pipeline maintains 99.9% uptime
Latency -- dashboard updates within 5 seconds of sensor reading; alerts trigger within 10 seconds of threshold breach
Consistency -- eventual consistency for dashboards; alerts must not miss genuine threshold breaches even if they occasionally fire on transient spikes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Ingestion Pipeline for High-Frequency Sensor Data

Thousands of sensors generating readings every few seconds creates substantial write volume. Interviewers want to see a robust ingestion path.

Hints to consider:

Use MQTT or lightweight HTTP endpoints at the edge for sensor communication, bridging to Kafka for durability
Partition Kafka topics by location_id or sensor_id for ordered processing and parallel consumption
Handle sensor retries and duplicate readings with idempotency based on (sensor_id, timestamp) composite keys
Implement local buffering on sensor gateways to handle intermittent connectivity without data loss

2. Time-Series Storage and Efficient Querying

Temperature data accumulates rapidly and must be queryable across various time ranges with different granularities. Interviewers probe your retention and aggregation strategy.

Hints to consider:

Store raw readings in a time-series database with (location_id, sensor_id, timestamp) as the key structure
Pre-compute aggregations: keep raw 10-second data for 7 days, 1-minute averages for 30 days, hourly averages for 1 year
Use time-based partitioning to enable efficient range queries and automated data lifecycle management (drop old partitions)
Apply compression techniques optimized for time-series (delta-of-delta, gorilla encoding) to reduce storage costs

3. Real-Time Alerting with Debouncing

Temperature alerts are safety-critical but must avoid false alarms from transient sensor spikes. Interviewers look for a reliable yet pragmatic alerting design.

Hints to consider:

Process the live Kafka stream with a stateful consumer that maintains per-sensor sliding windows
Require sustained threshold breaches (e.g., 3 consecutive readings above limit) before triggering an alert to suppress noise
Fan out alerts to multiple channels (dashboard banner, SMS, email) through a notification service with delivery confirmation
Implement alert acknowledgment and escalation: if unacknowledged within 5 minutes, re-alert to a secondary contact

4. Dashboard Architecture and Live Updates

Operators need dashboards that feel live. Interviewers want to see how you serve current and historical data efficiently.

Hints to consider:

Maintain a current-state store (Redis) with the latest reading per sensor, updated from the Kafka stream
Serve the live dashboard from Redis for instant loads, with WebSocket push for incremental updates
For historical charts, query the time-series database with automatic resolution selection based on the requested time range
Cache popular dashboard configurations (e.g., "all sensors at location X for last 24 hours") in Redis with short TTLs

Practice/Amazon/Design a Temperature Monitoring System

Design a Temperature Monitoring System

System DesignMust

Problem Statement

Key Requirements

Functional

Sensor data collection -- accept temperature readings from thousands of sensors across hundreds of locations, reporting every 10-30 seconds
Real-time dashboard -- operators view current temperatures per location and sensor on an interactive dashboard with auto-refresh
Historical charts -- operators view temperature trends over configurable time windows (1 hour, 24 hours, 7 days, 30 days) with zoom and drill-down
Threshold alerts -- operators configure temperature ranges per location; the system notifies within seconds when readings breach thresholds

Non-Functional

Scalability -- handle 100,000+ sensors with reporting intervals of 10-30 seconds, producing millions of readings per minute
Reliability -- no data loss during network interruptions between sensors and the backend; alert pipeline maintains 99.9% uptime
Latency -- dashboard updates within 5 seconds of sensor reading; alerts trigger within 10 seconds of threshold breach
Consistency -- eventual consistency for dashboards; alerts must not miss genuine threshold breaches even if they occasionally fire on transient spikes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Ingestion Pipeline for High-Frequency Sensor Data

Thousands of sensors generating readings every few seconds creates substantial write volume. Interviewers want to see a robust ingestion path.

Hints to consider:

Use MQTT or lightweight HTTP endpoints at the edge for sensor communication, bridging to Kafka for durability
Partition Kafka topics by location_id or sensor_id for ordered processing and parallel consumption
Handle sensor retries and duplicate readings with idempotency based on (sensor_id, timestamp) composite keys
Implement local buffering on sensor gateways to handle intermittent connectivity without data loss

2. Time-Series Storage and Efficient Querying

Temperature data accumulates rapidly and must be queryable across various time ranges with different granularities. Interviewers probe your retention and aggregation strategy.

Hints to consider:

Store raw readings in a time-series database with (location_id, sensor_id, timestamp) as the key structure
Pre-compute aggregations: keep raw 10-second data for 7 days, 1-minute averages for 30 days, hourly averages for 1 year
Use time-based partitioning to enable efficient range queries and automated data lifecycle management (drop old partitions)
Apply compression techniques optimized for time-series (delta-of-delta, gorilla encoding) to reduce storage costs

3. Real-Time Alerting with Debouncing

Temperature alerts are safety-critical but must avoid false alarms from transient sensor spikes. Interviewers look for a reliable yet pragmatic alerting design.

Hints to consider:

Process the live Kafka stream with a stateful consumer that maintains per-sensor sliding windows
Require sustained threshold breaches (e.g., 3 consecutive readings above limit) before triggering an alert to suppress noise
Fan out alerts to multiple channels (dashboard banner, SMS, email) through a notification service with delivery confirmation
Implement alert acknowledgment and escalation: if unacknowledged within 5 minutes, re-alert to a secondary contact

4. Dashboard Architecture and Live Updates

Operators need dashboards that feel live. Interviewers want to see how you serve current and historical data efficiently.

Hints to consider:

Maintain a current-state store (Redis) with the latest reading per sensor, updated from the Kafka stream
Serve the live dashboard from Redis for instant loads, with WebSocket push for incremental updates
For historical charts, query the time-series database with automatic resolution selection based on the requested time range
Cache popular dashboard configurations (e.g., "all sensors at location X for last 24 hours") in Redis with short TTLs