Practice/Microsoft/Design a Unique Identifier System for Customers
Design a Unique Identifier System for Customers
System DesignMust
Problem Statement
Design a session management system for a global e-commerce platform like Amazon that handles 500 million active users across multiple regions. The system must store and retrieve user session data (shopping cart contents, browsing history, authentication tokens, and personalization preferences) with strict latency requirements while ensuring consistency and availability.
Your design should handle scenarios where users hop between devices mid-session, survive regional outages without data loss, and scale horizontally during peak shopping events like Black Friday when traffic can spike 10x normal load. The architecture must balance the CAP theorem tradeoffs appropriately for different types of session data, support session expiration and cleanup, and provide fast lookups even when sessions grow to several KB in size.
Key Requirements
Functional
- Session creation and retrieval -- Users must be able to create a new session on login and retrieve existing session data on subsequent requests with their session token
- Cross-device continuity -- Session data must be accessible when a user switches from mobile app to web browser or across different geographic locations
- Selective updates -- Support partial session updates (e.g., adding one item to cart) without overwriting the entire session object
- Automatic expiration -- Sessions should automatically expire after 30 days of inactivity with efficient cleanup of stale data
Non-Functional
- Scalability -- Handle 500M active sessions with 100K session reads/writes per second globally, scaling to 1M req/s during peak events
- Reliability -- Ensure 99.99% availability with data durability guarantees; tolerate regional datacenter failures without losing active sessions
- Latency -- p99 read latency under 50ms and write latency under 100ms from any region
- Consistency -- Strong consistency for authentication tokens; eventual consistency acceptable for shopping cart and preferences
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Partitioning and Data Locality
Interviewers want to see how you distribute session data across nodes and regions to minimize latency while avoiding hotspots. The challenge is balancing data locality (keeping sessions close to users) with even load distribution.
Hints to consider:
- Use consistent hashing with session ID as the partition key to distribute load evenly across storage nodes
- Consider geo-aware routing that directs requests to the nearest regional cluster while maintaining a global registry for cross-region lookups
- Think about sticky routing at the load balancer level to reduce cache misses for the same user's repeated requests
- Discuss replication strategy: should replicas be in the same region (low latency) or across regions (disaster recovery)?
2. Handling Concurrent Updates
Multiple requests from the same user can arrive simultaneously (e.g., adding items to cart from different browser tabs). Your design must prevent lost updates without introducing prohibitive locking overhead.
Hints to consider:
- Evaluate optimistic concurrency control using version numbers or timestamps with compare-and-swap operations
- Consider breaking sessions into smaller sub-objects (cart, preferences, auth) that can be updated independently to reduce contention
- Discuss conflict-free replicated data types (CRDTs) for shopping carts where eventual consistency is acceptable
- Address the tradeoff between strong consistency (slower, more coordination) and eventual consistency (faster, potential conflicts)
3. Cache Strategy and Invalidation
Sessions are read-heavy, making caching essential, but stale cache data can break user experience. Interviewers probe your understanding of multi-tier caching and invalidation patterns.
Hints to consider:
- Design a two-tier cache: application-level (in-process) and distributed (Redis/Memcached) with different TTLs
- Use write-through or write-behind caching strategies based on consistency requirements for different session attributes
- Implement cache invalidation via pub/sub or change data capture to propagate updates across cache layers
- Consider cache warming strategies for newly active users and cache eviction policies (LRU vs TTL-based)
4. Security and Authentication Token Management
Session tokens are security-critical. Interviewers expect you to address token generation, validation, rotation, and the risks of token theft or replay attacks.
Hints to consider:
- Generate cryptographically secure random tokens with sufficient entropy (at least 128 bits) and store only hashed versions in the database
- Implement token rotation on sensitive actions and sliding expiration windows to balance security and user experience
- Discuss storing authentication state separately from session data with stricter consistency and replication requirements
- Address cross-site request forgery (CSRF) protection and how session tokens differ from JWT claims for stateless authentication