Practice/Meta/Design Redis

Design Redis

System DesignMust

Problem Statement

Design a global leaderboard system that tracks and ranks millions of players in real-time across competitive online games. The system must handle frequent score updates from active players, support instant retrieval of top rankings, and allow users to query their current rank and nearby competitors with minimal latency.

Consider a multiplayer gaming platform where millions of concurrent users compete across different game modes and tournaments. Each match completion triggers score updates, and players expect to see their new rank within milliseconds. The system must balance write-heavy workloads during peak hours with low-latency read requirements while maintaining consistent rankings across all queries. Your design should handle both the hot path for active tournaments and the long tail of historical leaderboards that still need occasional queries.

Key Requirements

Functional

Score updates -- Accept score changes for millions of players with automatic rank recalculation
Top-N queries -- Retrieve the top 100 players with sub-50ms latency
Rank lookup -- Return any player's current rank and score in under 100ms
Nearby players -- Show 10 players above and below a given player's position
Multiple leaderboards -- Support separate rankings per game mode, region, and time period

Non-Functional

Scalability -- Handle 100 million active players with 50,000 score updates per second during peak hours
Reliability -- Maintain 99.95% availability with no data loss on infrastructure failures
Latency -- P95 read latency under 50ms and write acknowledgment within 200ms
Consistency -- Eventual consistency is acceptable with rank convergence within 5 seconds globally

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Data Structure Selection for Ranking

The core challenge is choosing a data structure that efficiently maintains sorted order while supporting frequent updates and range queries. Naive approaches like sorting on every read or maintaining a single global sorted list create unacceptable bottlenecks.

Hints to consider:

Sorted sets with skip lists provide O(log N) updates and range queries, similar to Redis ZADD/ZRANGE operations
Consider sharding strategies that preserve ranking semantics, such as range-based partitioning by score
Trade off exact ranks versus approximate ranks using probabilistic data structures for massive scale
Evaluate whether to materialize ranks eagerly on writes or compute them lazily on reads

2. Handling Write Contention and Hot Keys

During major tournaments, popular games create extreme write hotspots where thousands of players update scores simultaneously. A single-threaded bottleneck or coarse-grained locking will create tail latency disasters.

Hints to consider:

Partition leaderboards by game mode and tournament ID to distribute write load across shards
Use buffering and micro-batching to aggregate rapid-fire updates from the same player
Implement optimistic concurrency control or compare-and-swap operations to handle concurrent updates
Consider write-behind caching patterns where updates are acknowledged quickly and applied asynchronously

3. Efficient Rank Computation

Computing a player's exact rank (how many players have higher scores) requires counting all better-performing players, which becomes prohibitively expensive at scale. The system needs clever indexing or approximation techniques.

Hints to consider:

Precompute and cache ranks periodically, then interpolate for players whose scores haven't changed
Use augmented trees or B-trees with subtree counts to enable O(log N) rank queries
For top-heavy distributions, maintain exact counts for top tiers and approximate counts for lower tiers
Employ consistent hashing to ensure players always query the same shard for cache locality

4. Scaling Across Geographic Regions

Global games need leaderboards accessible from multiple continents with minimal latency, but maintaining perfect consistency across regions creates coordination overhead that kills performance.

Hints to consider:

Deploy regional read replicas with eventual consistency for queries, routing users to their nearest datacenter
Funnel all writes to authoritative shards with async replication to read replicas
Use multi-level leaderboards where regional boards merge periodically into global rankings
Accept stale reads with bounded staleness guarantees, displaying "as of N seconds ago" timestamps

Practice/Meta/Design Redis

Design Redis

System DesignMust

Problem Statement

Key Requirements

Functional

Score updates -- Accept score changes for millions of players with automatic rank recalculation
Top-N queries -- Retrieve the top 100 players with sub-50ms latency
Rank lookup -- Return any player's current rank and score in under 100ms
Nearby players -- Show 10 players above and below a given player's position
Multiple leaderboards -- Support separate rankings per game mode, region, and time period

Non-Functional

Scalability -- Handle 100 million active players with 50,000 score updates per second during peak hours
Reliability -- Maintain 99.95% availability with no data loss on infrastructure failures
Latency -- P95 read latency under 50ms and write acknowledgment within 200ms
Consistency -- Eventual consistency is acceptable with rank convergence within 5 seconds globally

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Data Structure Selection for Ranking

Hints to consider:

Sorted sets with skip lists provide O(log N) updates and range queries, similar to Redis ZADD/ZRANGE operations
Consider sharding strategies that preserve ranking semantics, such as range-based partitioning by score
Trade off exact ranks versus approximate ranks using probabilistic data structures for massive scale
Evaluate whether to materialize ranks eagerly on writes or compute them lazily on reads

2. Handling Write Contention and Hot Keys

Hints to consider:

Partition leaderboards by game mode and tournament ID to distribute write load across shards
Use buffering and micro-batching to aggregate rapid-fire updates from the same player
Implement optimistic concurrency control or compare-and-swap operations to handle concurrent updates
Consider write-behind caching patterns where updates are acknowledged quickly and applied asynchronously

3. Efficient Rank Computation

Hints to consider:

Precompute and cache ranks periodically, then interpolate for players whose scores haven't changed
Use augmented trees or B-trees with subtree counts to enable O(log N) rank queries
For top-heavy distributions, maintain exact counts for top tiers and approximate counts for lower tiers
Employ consistent hashing to ensure players always query the same shard for cache locality

4. Scaling Across Geographic Regions

Global games need leaderboards accessible from multiple continents with minimal latency, but maintaining perfect consistency across regions creates coordination overhead that kills performance.

Hints to consider:

Deploy regional read replicas with eventual consistency for queries, routing users to their nearest datacenter
Funnel all writes to authoritative shards with async replication to read replicas
Use multi-level leaderboards where regional boards merge periodically into global rankings
Accept stale reads with bounded staleness guarantees, displaying "as of N seconds ago" timestamps