Practice/Amazon/Design a Smart City Sensor System

Design a Smart City Sensor System

System DesignMust

Problem Statement

Design a smart city sensor system that can collect data from millions of IoT sensors deployed across a metropolitan area and compute hourly, daily, and weekly aggregate metrics. Sensors report measurements such as air quality, traffic flow, noise levels, and temperature every 10-30 seconds. The system must ingest this high-volume telemetry, store it efficiently for historical analysis, provide real-time dashboards for city operators, and trigger alerts when readings exceed safety thresholds.

At Amazon, interviewers ask this to evaluate your ability to design high-throughput ingestion pipelines, time-series storage with tiered retention, stream processing for real-time alerting, and efficient aggregation across multiple time windows. The challenge is handling millions of concurrent data streams while maintaining sub-minute freshness for dashboards and second-level responsiveness for safety alerts.

Key Requirements

Functional

Sensor data ingestion -- accept telemetry from millions of heterogeneous sensors via lightweight protocols (MQTT, HTTP) with guaranteed delivery
Real-time dashboards -- city operators view live sensor readings on geographic maps with drill-down into specific areas and sensor types
Time-windowed aggregations -- compute and serve hourly, daily, and weekly aggregate statistics (min, max, average, percentiles) per sensor, area, and metric type
Threshold-based alerts -- notify operators within seconds when sensor readings exceed configurable safety thresholds (e.g., air quality index above hazardous levels)

Non-Functional

Scalability -- handle 10 million sensors reporting every 15 seconds, producing 40 million events per minute at peak
Reliability -- no data loss during ingestion spikes or component failures; support at-least-once delivery with deduplication
Latency -- dashboard data fresh within 30 seconds; alerts triggered within 5 seconds of threshold breach
Consistency -- eventual consistency for dashboards and aggregations; alerts tolerate brief false positives but must not miss genuine threshold breaches

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. High-Throughput Ingestion Pipeline

Millions of sensors reporting simultaneously create massive write pressure. Interviewers want to see how you absorb this traffic without dropping data or overwhelming downstream systems.

Hints to consider:

Use a message broker (Kafka) as a durable buffer between sensor gateways and processing layers, partitioned by sensor_id or geographic region
Deploy MQTT brokers at the edge for lightweight sensor communication, bridging to Kafka for durability and processing
Implement client-side batching on sensor gateways to reduce message overhead and network round trips
Handle duplicate messages with idempotency keys (sensor_id + timestamp) to support at-least-once delivery safely

2. Time-Series Storage and Aggregation

With billions of data points per day, storage design determines both cost and query performance. Interviewers probe your approach to tiered storage and pre-aggregation.

Hints to consider:

Store raw sensor data in a time-series database (TimescaleDB, InfluxDB) partitioned by time and sensor type
Pre-compute hourly, daily, and weekly aggregations using stream processing, writing results to a separate materialized table
Implement tiered retention: full-resolution data for 30 days, hourly rollups for 1 year, daily rollups indefinitely
Use columnar storage or Parquet files on S3 for long-term archival with query access via Athena or Spark

3. Real-Time Alerting Pipeline

Safety alerts must fire within seconds of a threshold breach. Interviewers look for a dedicated low-latency path separate from batch processing.

Hints to consider:

Branch the ingestion stream: one path feeds into aggregation workers, another feeds into a lightweight alerting processor
Use stateful stream processing (Flink) to maintain per-sensor sliding windows and detect threshold crossings
Apply hysteresis or debouncing (require N consecutive readings above threshold) to suppress false alarms from sensor noise
Fan out alerts through a notification service with multiple channels (dashboard push, SMS, email) and rate limiting per operator

4. Geographic Visualization and Query Performance

Dashboards showing sensor data on maps with drill-down require efficient geo-spatial queries. Interviewers want to see how you serve rich visualizations without overloading the database.

Hints to consider:

Maintain a current-state cache (Redis) with the latest reading per sensor for live map rendering
Use geohashing or spatial partitioning to group sensors into grid cells for area-level aggregation queries
Pre-compute area-level aggregates at multiple zoom levels to serve fast map tile rendering
Push dashboard updates via WebSocket to connected operators rather than polling

Practice/Amazon/Design a Smart City Sensor System

Design a Smart City Sensor System

System DesignMust

Problem Statement

Key Requirements

Functional

Sensor data ingestion -- accept telemetry from millions of heterogeneous sensors via lightweight protocols (MQTT, HTTP) with guaranteed delivery
Real-time dashboards -- city operators view live sensor readings on geographic maps with drill-down into specific areas and sensor types
Time-windowed aggregations -- compute and serve hourly, daily, and weekly aggregate statistics (min, max, average, percentiles) per sensor, area, and metric type
Threshold-based alerts -- notify operators within seconds when sensor readings exceed configurable safety thresholds (e.g., air quality index above hazardous levels)

Non-Functional

Scalability -- handle 10 million sensors reporting every 15 seconds, producing 40 million events per minute at peak
Reliability -- no data loss during ingestion spikes or component failures; support at-least-once delivery with deduplication
Latency -- dashboard data fresh within 30 seconds; alerts triggered within 5 seconds of threshold breach
Consistency -- eventual consistency for dashboards and aggregations; alerts tolerate brief false positives but must not miss genuine threshold breaches

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. High-Throughput Ingestion Pipeline

Millions of sensors reporting simultaneously create massive write pressure. Interviewers want to see how you absorb this traffic without dropping data or overwhelming downstream systems.

Hints to consider:

Use a message broker (Kafka) as a durable buffer between sensor gateways and processing layers, partitioned by sensor_id or geographic region
Deploy MQTT brokers at the edge for lightweight sensor communication, bridging to Kafka for durability and processing
Implement client-side batching on sensor gateways to reduce message overhead and network round trips
Handle duplicate messages with idempotency keys (sensor_id + timestamp) to support at-least-once delivery safely

2. Time-Series Storage and Aggregation

With billions of data points per day, storage design determines both cost and query performance. Interviewers probe your approach to tiered storage and pre-aggregation.

Hints to consider:

Store raw sensor data in a time-series database (TimescaleDB, InfluxDB) partitioned by time and sensor type
Pre-compute hourly, daily, and weekly aggregations using stream processing, writing results to a separate materialized table
Implement tiered retention: full-resolution data for 30 days, hourly rollups for 1 year, daily rollups indefinitely
Use columnar storage or Parquet files on S3 for long-term archival with query access via Athena or Spark

3. Real-Time Alerting Pipeline

Safety alerts must fire within seconds of a threshold breach. Interviewers look for a dedicated low-latency path separate from batch processing.

Hints to consider:

Branch the ingestion stream: one path feeds into aggregation workers, another feeds into a lightweight alerting processor
Use stateful stream processing (Flink) to maintain per-sensor sliding windows and detect threshold crossings
Apply hysteresis or debouncing (require N consecutive readings above threshold) to suppress false alarms from sensor noise
Fan out alerts through a notification service with multiple channels (dashboard push, SMS, email) and rate limiting per operator

4. Geographic Visualization and Query Performance

Dashboards showing sensor data on maps with drill-down require efficient geo-spatial queries. Interviewers want to see how you serve rich visualizations without overloading the database.

Hints to consider:

Maintain a current-state cache (Redis) with the latest reading per sensor for live map rendering
Use geohashing or spatial partitioning to group sensors into grid cells for area-level aggregation queries
Pre-compute area-level aggregates at multiple zoom levels to serve fast map tile rendering
Push dashboard updates via WebSocket to connected operators rather than polling