Design a system that monitors GPS signals from a large fleet of vehicles and notifies customers the moment a vehicle enters or exits a predefined geographic boundary (geofence). Vehicles report their locations via LTE gateways, and customers define geofences as polygons or circles around sites such as warehouses, yards, or customer addresses. The platform must process continuous GPS streams in near real-time, evaluate each location update against potentially thousands of geofences per customer, and deliver reliable alerts through multiple channels including push notifications, SMS, email, and webhooks.
Interviewers use this problem to test your ability to build streaming data pipelines, perform geospatial matching at scale, and deliver reliable notifications under noisy, out-of-order data. Strong answers balance product simplicity with operational reliability, covering idempotent event processing, GPS jitter suppression, edge buffering during connectivity gaps, and clear tradeoffs around accuracy, cost, and latency.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Vehicles produce a continuous firehose of GPS updates that must be ingested, ordered, and processed with per-vehicle context. Interviewers want to see how you absorb high write throughput and maintain stateful processing.
Hints to consider:
Each incoming GPS point must be tested against potentially thousands of geofences. Doing a brute-force point-in-polygon check against a database for every update will not scale.
Hints to consider:
GPS accuracy can fluctuate by tens of meters, causing a vehicle parked near a geofence boundary to appear to oscillate in and out. Without suppression, the system generates a storm of useless alerts.
Hints to consider:
Notifications must reach users without duplicates, even when retries occur due to provider failures or consumer restarts. Interviewers expect deduplication at the notification layer.
Hints to consider:
Confirm the fleet size, location reporting frequency, and number of geofences per customer. Ask whether geofences are mostly static or frequently updated, and whether customers need real-time updates when geofence definitions change. Clarify latency targets for alert delivery and whether the system should support complex rules like dwell-time thresholds or schedule-based suppression. Determine if the system must operate across multiple regions and whether historical event replay is a requirement for compliance.
Sketch the data flow: vehicle gateways push GPS updates to a load-balanced ingestion API, which publishes them to a Kafka topic partitioned by vehicle_id. A Flink job consumes the stream, loads geofence definitions from a spatial index, evaluates boundary crossings with hysteresis logic, and emits enter/exit events to a downstream Kafka topic. A notification service consumes these events, enriches them with alert rules and user preferences, deduplicates using Redis, and dispatches through channel-specific providers (FCM for push, Twilio for SMS, SES for email, HTTP for webhooks). A separate consumer writes all events to a time-series store for the historical audit trail. Include a geofence management API that writes to a PostgreSQL-backed store and publishes change events so stream processors refresh their spatial indexes.
Walk through the processing of a single GPS update in detail. The Flink job receives a location event keyed by vehicle_id. It loads the vehicle's assigned geofences from its local spatial index and performs a coarse geohash lookup to identify candidate geofences within range. For each candidate, it runs a precise point-in-polygon or point-in-circle test. It then compares the result against the vehicle's persisted state for that geofence. If the vehicle was previously outside and is now inside, the processor increments a consecutive-inside counter. Once the counter exceeds the configured hysteresis threshold, it emits an ENTER event, resets the counter, and updates the state to INSIDE. The inverse logic applies for EXIT events. If the test result matches the current state, the counters reset. All state is checkpointed to Flink's state backend for fault tolerance.
Cover connectivity gaps: vehicle gateways buffer GPS points locally during LTE outages, stamped with device time, and transmit them in order when connectivity resumes. The stream processor handles these late arrivals via event-time watermarks. Discuss notification fairness: implement per-customer rate limits to prevent a single large fleet from monopolizing notification provider capacity. Address geofence updates: publish change events to a Kafka compacted topic that stream processors consume to incrementally update their spatial indexes without a full restart. Mention monitoring: track ingestion lag, geofence evaluation latency per event, alert delivery latency by channel, suppressed-flap counts, and dead-letter queue depth. Briefly discuss scaling: add Flink task slots and Kafka partitions as the fleet grows, and shard the spatial index by region if a single instance's memory becomes a bottleneck.