Practice/Amazon/Design a Smart City Sensor System
Design a Smart City Sensor System
System DesignMust
Problem Statement
Design a smart city sensor system that can collect data from millions of IoT sensors deployed across a metropolitan area and compute hourly, daily, and weekly aggregate metrics. Sensors report measurements such as air quality, traffic flow, noise levels, and temperature every 10-30 seconds. The system must ingest this high-volume telemetry, store it efficiently for historical analysis, provide real-time dashboards for city operators, and trigger alerts when readings exceed safety thresholds.
At Amazon, interviewers ask this to evaluate your ability to design high-throughput ingestion pipelines, time-series storage with tiered retention, stream processing for real-time alerting, and efficient aggregation across multiple time windows. The challenge is handling millions of concurrent data streams while maintaining sub-minute freshness for dashboards and second-level responsiveness for safety alerts.
Key Requirements
Functional
- Sensor data ingestion -- accept telemetry from millions of heterogeneous sensors via lightweight protocols (MQTT, HTTP) with guaranteed delivery
- Real-time dashboards -- city operators view live sensor readings on geographic maps with drill-down into specific areas and sensor types
- Time-windowed aggregations -- compute and serve hourly, daily, and weekly aggregate statistics (min, max, average, percentiles) per sensor, area, and metric type
- Threshold-based alerts -- notify operators within seconds when sensor readings exceed configurable safety thresholds (e.g., air quality index above hazardous levels)
Non-Functional
- Scalability -- handle 10 million sensors reporting every 15 seconds, producing 40 million events per minute at peak
- Reliability -- no data loss during ingestion spikes or component failures; support at-least-once delivery with deduplication
- Latency -- dashboard data fresh within 30 seconds; alerts triggered within 5 seconds of threshold breach
- Consistency -- eventual consistency for dashboards and aggregations; alerts tolerate brief false positives but must not miss genuine threshold breaches
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. High-Throughput Ingestion Pipeline
Millions of sensors reporting simultaneously create massive write pressure. Interviewers want to see how you absorb this traffic without dropping data or overwhelming downstream systems.
Hints to consider:
- Use a message broker (Kafka) as a durable buffer between sensor gateways and processing layers, partitioned by sensor_id or geographic region
- Deploy MQTT brokers at the edge for lightweight sensor communication, bridging to Kafka for durability and processing
- Implement client-side batching on sensor gateways to reduce message overhead and network round trips
- Handle duplicate messages with idempotency keys (sensor_id + timestamp) to support at-least-once delivery safely
2. Time-Series Storage and Aggregation
With billions of data points per day, storage design determines both cost and query performance. Interviewers probe your approach to tiered storage and pre-aggregation.
Hints to consider:
- Store raw sensor data in a time-series database (TimescaleDB, InfluxDB) partitioned by time and sensor type
- Pre-compute hourly, daily, and weekly aggregations using stream processing, writing results to a separate materialized table
- Implement tiered retention: full-resolution data for 30 days, hourly rollups for 1 year, daily rollups indefinitely
- Use columnar storage or Parquet files on S3 for long-term archival with query access via Athena or Spark
3. Real-Time Alerting Pipeline
Safety alerts must fire within seconds of a threshold breach. Interviewers look for a dedicated low-latency path separate from batch processing.
Hints to consider:
- Branch the ingestion stream: one path feeds into aggregation workers, another feeds into a lightweight alerting processor
- Use stateful stream processing (Flink) to maintain per-sensor sliding windows and detect threshold crossings
- Apply hysteresis or debouncing (require N consecutive readings above threshold) to suppress false alarms from sensor noise
- Fan out alerts through a notification service with multiple channels (dashboard push, SMS, email) and rate limiting per operator
4. Geographic Visualization and Query Performance
Dashboards showing sensor data on maps with drill-down require efficient geo-spatial queries. Interviewers want to see how you serve rich visualizations without overloading the database.
Hints to consider:
- Maintain a current-state cache (Redis) with the latest reading per sensor for live map rendering
- Use geohashing or spatial partitioning to group sensors into grid cells for area-level aggregation queries
- Pre-compute area-level aggregates at multiple zoom levels to serve fast map tile rendering
- Push dashboard updates via WebSocket to connected operators rather than polling