Design a geofence notification system for vehicle fleets — Samsara

Problem Statement

Design a system that monitors GPS signals from a large fleet of vehicles and notifies customers the moment a vehicle enters or exits a predefined geographic boundary (geofence). Vehicles report their locations via LTE gateways, and customers define geofences as polygons or circles around sites such as warehouses, yards, or customer addresses. The platform must process continuous GPS streams in near real-time, evaluate each location update against potentially thousands of geofences per customer, and deliver reliable alerts through multiple channels including push notifications, SMS, email, and webhooks.

Interviewers use this problem to test your ability to build streaming data pipelines, perform geospatial matching at scale, and deliver reliable notifications under noisy, out-of-order data. Strong answers balance product simplicity with operational reliability, covering idempotent event processing, GPS jitter suppression, edge buffering during connectivity gaps, and clear tradeoffs around accuracy, cost, and latency.

Key Requirements

Functional

Geofence management -- Users create and manage geofences as polygons or circles with naming, metadata, and assignment to specific vehicles or vehicle groups
Alert rule configuration -- Users configure alert triggers (enter, exit, optional dwell time), delivery schedules, and notification channels per geofence
Low-latency notifications -- The system delivers reliable alerts within seconds of a vehicle crossing a geofence boundary, with safeguards against duplicate or flapping alerts from GPS noise
Historical audit trail -- Users view a log of all geofence events and notifications, filterable by vehicle, geofence, and time range

Non-Functional

Scalability -- Handle 500,000 vehicles reporting locations every 5 to 10 seconds, with each customer managing up to 10,000 geofences
Reliability -- Tolerate LTE connectivity gaps with edge buffering; guarantee at-least-once event processing with idempotent downstream delivery
Latency -- Detect boundary crossings and deliver notifications within 5 seconds of receiving the triggering GPS update
Consistency -- Accept eventual consistency for the audit trail; ensure exactly-once alert semantics per vehicle-geofence pair through deduplication

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Streaming Ingestion and Per-Vehicle State Management

Vehicles produce a continuous firehose of GPS updates that must be ingested, ordered, and processed with per-vehicle context. Interviewers want to see how you absorb high write throughput and maintain stateful processing.

Hints to consider:

Use a partitioned event log such as Kafka with vehicle_id as the partition key to preserve per-vehicle ordering and enable horizontal scaling
Deploy a stateful stream processor (Flink or similar) that maintains a state machine per vehicle-geofence pair, tracking whether the vehicle is currently inside or outside each relevant geofence
Handle out-of-order and late-arriving GPS points using watermarks and event-time processing rather than wall-clock time
Buffer location data on the vehicle gateway during LTE outages and replay in order when connectivity returns

2. Efficient Geospatial Matching at Scale

Each incoming GPS point must be tested against potentially thousands of geofences. Doing a brute-force point-in-polygon check against a database for every update will not scale.

Hints to consider:

Build an in-memory spatial index (R-tree, quadtree, or geohash grid) on each stream processor instance, loaded from the geofence store on startup and refreshed on changes
Use the spatial index to narrow candidate geofences to a small set, then perform precise point-in-polygon tests only on those candidates
Partition the spatial workload by geographic region or customer to keep index sizes manageable
Discuss how geofence updates (create, edit, delete) propagate to the in-memory indexes without restarting the processing pipeline

3. Suppressing Flapping Alerts from GPS Jitter

GPS accuracy can fluctuate by tens of meters, causing a vehicle parked near a geofence boundary to appear to oscillate in and out. Without suppression, the system generates a storm of useless alerts.

Hints to consider:

Implement hysteresis by requiring N consecutive points (or a dwell time of T seconds) inside or outside the boundary before emitting an enter or exit event
Maintain per-vehicle-geofence state in the stream processor that tracks consecutive confirmations and resets on contradictory readings
Allow customers to configure sensitivity thresholds so high-precision use cases can opt for tighter detection
Log suppressed oscillations for debugging without triggering user-facing notifications

4. Reliable and Idempotent Alert Delivery

Notifications must reach users without duplicates, even when retries occur due to provider failures or consumer restarts. Interviewers expect deduplication at the notification layer.

Hints to consider:

Assign a deterministic event ID (hash of vehicle_id, geofence_id, event_type, and timestamp window) to each geofence event
Check this ID against a fast cache (Redis with TTL) before dispatching to notification providers, skipping if already sent
Use per-provider circuit breakers with exponential backoff and dead-letter queues for messages that exhaust retries
Support multiple delivery channels (push, SMS, email, webhook) with independent retry policies per channel

Suggested Approach

Step 1: Clarify Requirements

Confirm the fleet size, location reporting frequency, and number of geofences per customer. Ask whether geofences are mostly static or frequently updated, and whether customers need real-time updates when geofence definitions change. Clarify latency targets for alert delivery and whether the system should support complex rules like dwell-time thresholds or schedule-based suppression. Determine if the system must operate across multiple regions and whether historical event replay is a requirement for compliance.

Step 2: High-Level Architecture

Sketch the data flow: vehicle gateways push GPS updates to a load-balanced ingestion API, which publishes them to a Kafka topic partitioned by vehicle_id. A Flink job consumes the stream, loads geofence definitions from a spatial index, evaluates boundary crossings with hysteresis logic, and emits enter/exit events to a downstream Kafka topic. A notification service consumes these events, enriches them with alert rules and user preferences, deduplicates using Redis, and dispatches through channel-specific providers (FCM for push, Twilio for SMS, SES for email, HTTP for webhooks). A separate consumer writes all events to a time-series store for the historical audit trail. Include a geofence management API that writes to a PostgreSQL-backed store and publishes change events so stream processors refresh their spatial indexes.

Step 3: Deep Dive on Stream Processing and Geospatial Evaluation

Walk through the processing of a single GPS update in detail. The Flink job receives a location event keyed by vehicle_id. It loads the vehicle's assigned geofences from its local spatial index and performs a coarse geohash lookup to identify candidate geofences within range. For each candidate, it runs a precise point-in-polygon or point-in-circle test. It then compares the result against the vehicle's persisted state for that geofence. If the vehicle was previously outside and is now inside, the processor increments a consecutive-inside counter. Once the counter exceeds the configured hysteresis threshold, it emits an ENTER event, resets the counter, and updates the state to INSIDE. The inverse logic applies for EXIT events. If the test result matches the current state, the counters reset. All state is checkpointed to Flink's state backend for fault tolerance.

Step 4: Address Secondary Concerns

Cover connectivity gaps: vehicle gateways buffer GPS points locally during LTE outages, stamped with device time, and transmit them in order when connectivity resumes. The stream processor handles these late arrivals via event-time watermarks. Discuss notification fairness: implement per-customer rate limits to prevent a single large fleet from monopolizing notification provider capacity. Address geofence updates: publish change events to a Kafka compacted topic that stream processors consume to incrementally update their spatial indexes without a full restart. Mention monitoring: track ingestion lag, geofence evaluation latency per event, alert delivery latency by channel, suppressed-flap counts, and dead-letter queue depth. Briefly discuss scaling: add Flink task slots and Kafka partitions as the fleet grows, and shard the spatial index by region if a single instance's memory becomes a bottleneck.

Problem Statement

Key Requirements

Functional

Geofence management -- Users create and manage geofences as polygons or circles with naming, metadata, and assignment to specific vehicles or vehicle groups
Alert rule configuration -- Users configure alert triggers (enter, exit, optional dwell time), delivery schedules, and notification channels per geofence
Low-latency notifications -- The system delivers reliable alerts within seconds of a vehicle crossing a geofence boundary, with safeguards against duplicate or flapping alerts from GPS noise
Historical audit trail -- Users view a log of all geofence events and notifications, filterable by vehicle, geofence, and time range

Non-Functional

Scalability -- Handle 500,000 vehicles reporting locations every 5 to 10 seconds, with each customer managing up to 10,000 geofences
Reliability -- Tolerate LTE connectivity gaps with edge buffering; guarantee at-least-once event processing with idempotent downstream delivery
Latency -- Detect boundary crossings and deliver notifications within 5 seconds of receiving the triggering GPS update
Consistency -- Accept eventual consistency for the audit trail; ensure exactly-once alert semantics per vehicle-geofence pair through deduplication

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Streaming Ingestion and Per-Vehicle State Management

Hints to consider:

Use a partitioned event log such as Kafka with vehicle_id as the partition key to preserve per-vehicle ordering and enable horizontal scaling
Deploy a stateful stream processor (Flink or similar) that maintains a state machine per vehicle-geofence pair, tracking whether the vehicle is currently inside or outside each relevant geofence
Handle out-of-order and late-arriving GPS points using watermarks and event-time processing rather than wall-clock time
Buffer location data on the vehicle gateway during LTE outages and replay in order when connectivity returns

2. Efficient Geospatial Matching at Scale

Each incoming GPS point must be tested against potentially thousands of geofences. Doing a brute-force point-in-polygon check against a database for every update will not scale.

Hints to consider:

Build an in-memory spatial index (R-tree, quadtree, or geohash grid) on each stream processor instance, loaded from the geofence store on startup and refreshed on changes
Use the spatial index to narrow candidate geofences to a small set, then perform precise point-in-polygon tests only on those candidates
Partition the spatial workload by geographic region or customer to keep index sizes manageable
Discuss how geofence updates (create, edit, delete) propagate to the in-memory indexes without restarting the processing pipeline

3. Suppressing Flapping Alerts from GPS Jitter

GPS accuracy can fluctuate by tens of meters, causing a vehicle parked near a geofence boundary to appear to oscillate in and out. Without suppression, the system generates a storm of useless alerts.

Hints to consider:

Implement hysteresis by requiring N consecutive points (or a dwell time of T seconds) inside or outside the boundary before emitting an enter or exit event
Maintain per-vehicle-geofence state in the stream processor that tracks consecutive confirmations and resets on contradictory readings
Allow customers to configure sensitivity thresholds so high-precision use cases can opt for tighter detection
Log suppressed oscillations for debugging without triggering user-facing notifications

4. Reliable and Idempotent Alert Delivery

Notifications must reach users without duplicates, even when retries occur due to provider failures or consumer restarts. Interviewers expect deduplication at the notification layer.

Hints to consider:

Assign a deterministic event ID (hash of vehicle_id, geofence_id, event_type, and timestamp window) to each geofence event
Check this ID against a fast cache (Redis with TTL) before dispatching to notification providers, skipping if already sent
Use per-provider circuit breakers with exponential backoff and dead-letter queues for messages that exhaust retries
Support multiple delivery channels (push, SMS, email, webhook) with independent retry policies per channel