Design a building management system that tracks employee locations and activities in real time across a corporate campus. The system ingests events from badge readers at entry points, motion sensors in restrooms and common areas, and parking gate systems. It must provide live occupancy counts per zone, enable individual employee location lookups, and report daily parking utilization metrics.
The system handles 50,000 employees across multiple buildings, with peak traffic generating thousands of badge swipes per minute during morning arrival. Facilities dashboards must update within seconds of any zone change. Historical data powers analytics for space planning and energy management. The core technical challenge is maintaining accurate counters under high concurrent updates while serving low-latency reads and handling edge cases like missed badge-outs, sensor failures, and duplicate events.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to see how you structure the domain entities and their relationships. The model must support both real-time state queries and historical analytics without forcing expensive joins or scans.
Hints to consider:
Main lobby doors and popular restrooms become hot spots where hundreds of events per minute update the same counter. Naive locking will serialize updates and create bottlenecks.
Hints to consider:
Badge readers may retry network requests, producing duplicate events. Out-of-order delivery happens when events route through different network paths. Without proper handling, counters drift and ghost occupants accumulate.
Hints to consider:
Employees occasionally forget to badge out, and sensors malfunction. Without correction mechanisms, occupancy counts drift upward over time as ghost occupants accumulate.
Hints to consider:
Confirm how many buildings and zones exist, typical and peak event rates, and how many concurrent users view dashboards. Clarify acceptable latency for live updates versus historical queries. Understand privacy constraints on individual tracking and whether second-by-second precision is needed or minute-level accuracy suffices. Confirm whether the system must handle retrofit buildings with incomplete sensor coverage.
Sketch three layers: ingestion, processing, and serving. Sensors publish events to Kafka partitioned by employee ID for ordering guarantees. Stream processors consume events, deduplicate using recent event IDs cached in Redis, and update materialized views in both a fast cache for current state and a durable store for history. The API layer reads from Redis for live occupancy and from a durable database for analytics queries. A background reconciliation job publishes corrected counts. WebSockets or server-sent events push occupancy changes to connected dashboards.
Walk through a badge swipe event lifecycle. The reader generates an event with timestamp, employee ID, zone ID, direction, and unique event ID. It lands in Kafka partitioned by employee ID. A stream consumer checks Redis for the event ID in a rolling 10-minute deduplication window; if found, it discards the duplicate. For new events, it atomically updates the employee's current location in Redis and increments the destination zone counter while decrementing the source zone. The processor appends the event to Cassandra for the employee's movement timeline. Partitioning by employee ID ensures concurrent events for different people do not conflict.
Cover how to serve various query patterns: live occupancy queries hit Redis for O(1) lookups, employee location lookups read a single Redis key, movement timelines query Cassandra with employee ID and time range, and parking analytics aggregate daily counters stored in a time-series table. Address caching for historical reports and precomputing popular aggregations overnight. Discuss monitoring by tracking event lag in Kafka and comparing expected versus actual zone totals. Explain disaster recovery through Kafka retention enabling full replay to rebuild all materialized views.