Practice/Meta/Design Proximity Service
Design Proximity Service
System DesignMust
Problem Statement
Build a system that enables users to discover other people, friends, or devices within their physical vicinity in real time. Users should be able to see who is nearby, receive push notifications when interesting entities enter or leave a specific radius, and optionally interact with them through lightweight actions like pings or connection requests. The service must handle millions of concurrent users, each potentially updating their location every few seconds, while maintaining sub-second query latency for proximity searches.
The core challenge lies in efficiently indexing and querying high-velocity geospatial data, pushing real-time presence updates without overwhelming clients or servers, and handling extreme density hotspots like concerts or university campuses where thousands of users might occupy a small geographic area. Privacy controls, battery efficiency, and graceful degradation under load are critical product constraints that shape every architectural decision.
Key Requirements
Functional
- Proximity search -- users can query for nearby entities within a configurable radius (100m to 5km) with optional filters like mutual connections or shared interests
- Real-time presence updates -- the system pushes immediate notifications when specified people or devices enter or leave a user's defined proximity zone
- Privacy controls -- users can toggle visibility modes (visible to all, friends only, invisible) and choose location precision (exact vs approximate neighborhood-level)
- Interaction primitives -- support lightweight proximity-triggered actions such as sending a quick ping, initiating a chat, or requesting to connect, with appropriate rate limiting
Non-Functional
- Scalability -- handle 100 million daily active users with peak concurrent online users of 20 million, ingesting 200,000 location updates per second globally
- Reliability -- maintain 99.9% uptime for core discovery features with graceful degradation during regional outages or hotspot overload scenarios
- Latency -- proximity queries return results within 500ms at p99, with real-time notifications delivered within 2 seconds of a qualifying location change
- Consistency -- eventual consistency is acceptable for presence data; users may see stale locations up to 30 seconds old during network partitions, but notification delivery must be at-least-once
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Geospatial Indexing and Query Strategy
Interviewers want to see how you partition the world into manageable chunks and avoid scanning millions of coordinates on every query. Naive distance calculations won't scale, so you need to demonstrate knowledge of spatial indexing techniques and explain tradeoffs between different approaches.
Hints to consider:
- Explore geohash or S2 cell hierarchies to map latitude/longitude pairs into discrete regions that support efficient prefix-based lookups
- Discuss cell size selection tradeoffs: smaller cells mean more shards and cross-boundary queries, larger cells mean more false positives to filter
- Consider hybrid approaches where you index into coarse cells for fast filtering, then compute exact Haversine distance for the candidate set
- Plan for edge cases like queries near cell boundaries requiring searches in multiple adjacent cells
2. High-Volume Write Ingestion and Throttling
Mobile clients can flood the system with location updates, especially during commutes or events. Interviewers expect you to design a buffered, batched ingest pipeline that prevents write amplification and respects battery constraints while keeping presence data fresh enough for real-time use cases.
Hints to consider:
- Implement client-side intelligent batching that sends updates only when location changes meaningfully (50+ meters) or after a timeout, reducing unnecessary traffic
- Use a message queue like Kafka to decouple high-variance write spikes from downstream processing, allowing consumers to smooth and deduplicate updates
- Apply server-side rate limiting per user to cap update frequency and reject excessive traffic gracefully with backpressure signals
- Store only the most recent position in hot storage with a TTL, archiving historical tracks to cold storage asynchronously if needed for analytics
3. Real-Time Notification Fanout Without Cascades
When a user moves, you must determine who is watching for their presence and notify those subscribers efficiently. Interviewers probe how you avoid notification storms during dense events and prevent flapping when users hover near zone boundaries.
Hints to consider:
- Maintain an inverted index mapping each geohash cell to the list of users subscribed to events in that area, enabling targeted fanout instead of broadcast
- Implement hysteresis or grace windows so entering/exiting a zone requires staying across the boundary for 10-20 seconds before firing a notification
- Use WebSocket or long-polling connections pooled by geographic region to push notifications, with connection servers sharded by user ID hash for even load distribution
- Introduce per-user and global rate limits on notification delivery to prevent cascading failures when thousands of users converge simultaneously
4. Handling Hotspots and Skewed Load
Dense urban areas, stadiums, and event venues create extreme skew where a single geohash cell might contain tens of thousands of active users. Interviewers want to see strategies for detecting and mitigating hotspots before they cause outages or unacceptable latency spikes.
Hints to consider:
- Monitor cell-level write and query rates, automatically splitting hot cells into finer subcells and redistributing the load across additional shards
- Use consistent hashing with virtual nodes when partitioning cells across servers, making it easy to rebalance without massive data movement
- Apply per-cell rate limiting or sampling in extreme hotspots, degrading gracefully by showing approximate counts or top-k results instead of exhaustive lists
- Consider read replicas or in-memory caches dedicated to the hottest cells, isolating their load from the rest of the system
5. Privacy, Security, and Abuse Prevention
Exposing real-time location is sensitive and opens vectors for stalking or harassment. Interviewers expect you to discuss privacy controls, data retention policies, and mechanisms to prevent malicious actors from abusing the service.
Hints to consider:
- Default to coarse location sharing (neighborhood-level) with explicit opt-in for precise coordinates, and allow users to maintain block lists
- Implement exponential backoff and anomaly detection to throttle users making excessive queries, which may indicate scraping or stalking behavior
- Store location data with short TTLs (minutes to hours) and encrypt it at rest and in transit, logging access for audit trails
- Provide transparency controls showing users who has queried their location recently, and offer temporary "go invisible" modes during sensitive situations