Practice/Meta/Design Instagram for a new mobile device OS
Design Instagram for a new mobile device OS
Product DesignMust
Problem Statement
Design a distributed system for a real-time sports betting platform that allows millions of users to place bets on live sporting events. The platform must handle rapid odds updates, process bet placements during high-traffic moments like final minutes of games, and ensure financial accuracy across all transactions. The system should support multiple concurrent sporting events, various bet types (single bets, parlays, live in-game betting), and provide users with real-time updates on their active bets and potential winnings.
Consider that during major events like championship games or tournaments, the system may experience traffic spikes of 100x normal load within seconds. The platform must maintain strict financial consistency while providing sub-second latency for odds updates and bet confirmations. Users expect to see odds changes immediately and need assurance that their bets are recorded correctly at the exact odds displayed when they clicked "Place Bet."
Key Requirements
Functional
- Live odds engine -- continuously calculate and broadcast odds changes based on betting patterns and game events
- Bet placement and validation -- accept user bets, verify account balances, lock odds at submission time
- Real-time event tracking -- integrate with sports data feeds to track game state and trigger bet settlements
- User portfolio management -- display active bets, betting history, account balance, and potential payouts
- Settlement processing -- automatically resolve bets when events conclude and credit winning accounts
- Multi-event support -- handle thousands of concurrent sporting events across different sports and leagues
Non-Functional
- Scalability -- support 10 million concurrent users during peak events, process 50,000+ bets per second
- Reliability -- 99.99% uptime with zero financial data loss, graceful degradation during outages
- Latency -- odds updates pushed to clients within 200ms, bet confirmation within 500ms
- Consistency -- strong consistency for financial transactions, eventual consistency acceptable for non-financial data
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Race Conditions and Odds Locking
The system must prevent users from betting on stale odds while handling thousands of concurrent bets on the same event. This is critical because odds change frequently based on betting volume and game events.
Hints to consider:
- Consider optimistic locking with version numbers on odds snapshots
- Discuss how to handle the scenario where odds change between user viewing and bet submission
- Explore using a two-phase commit for bet placement that validates odds haven't changed
- Think about time-bound odds guarantees (e.g., odds valid for 3 seconds)
2. Financial Transaction Consistency
Every bet placement involves deducting from user balance and every settlement involves crediting winners. These operations must be atomic and auditable, even during system failures.
Hints to consider:
- Design an event-sourced transaction ledger where every financial operation is an immutable event
- Discuss idempotency patterns to prevent duplicate debits or credits on retries
- Consider separating the bet placement system from settlement to allow independent scaling
- Explore how to implement distributed transactions across user accounts and bet records
3. Real-Time Data Distribution
Millions of connected clients need to receive odds updates with minimal latency and bandwidth consumption, especially during traffic spikes.
Hints to consider:
- Use WebSocket connections with intelligent throttling based on user engagement
- Consider a hierarchical pub/sub architecture where edge servers aggregate updates
- Discuss differential updates (only send changed odds) rather than full state
- Explore geographic distribution strategies to reduce latency for global users
4. Traffic Spike Handling
Traffic can increase 100x within seconds when major plays happen (last-minute goals, game-winning opportunities), requiring elastic scaling without service degradation.
Hints to consider:
- Design read replicas and caching layers that can absorb read traffic spikes
- Implement request queuing with priority for bet placement over odds queries
- Discuss auto-scaling triggers based on both CPU and custom metrics (bets per second)
- Consider circuit breakers that gracefully disable non-critical features during overload