Design an Online Game Leaderboard — PayPal

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Leaderboard guide.

Problem Statement

Design a real-time gaming leaderboard system for a competitive online game with millions of active players. The system must display global top rankings, each player's current rank along with nearby competitors, and leaderboards scoped to a player's friend list. Rankings should update within seconds of a match completing.

The fundamental difficulty is sustaining extremely high write throughput from concurrent game completions while simultaneously serving low-latency read queries for global rankings, K-neighbor lookups, and friend-scoped views. A single sorted structure quickly becomes a bottleneck at this scale, so you need a sharding strategy that distributes writes evenly, supports efficient rank computation across shards, and handles temporal windows such as daily, weekly, and seasonal resets without downtime or data loss.

Consider that popular games experience massive traffic spikes during tournaments and season launches, where both write and read volumes can surge by an order of magnitude. The system must remain responsive during these peaks and degrade gracefully if individual components fail.

Key Requirements

Functional

Global top-N rankings -- Display the top 100 or top 1000 players globally, updated in near real-time after each match
Player rank with K-neighbors -- Show any player their exact global rank and the K players immediately above and below them
Friend leaderboard -- Rank a player among their friends, supporting friend lists of up to several hundred connections
Time-windowed leaderboards -- Support daily, weekly, and seasonal leaderboards that reset on schedule and preserve historical snapshots
Score submission pipeline -- Accept score updates from game servers with exactly-once processing semantics to prevent duplicate or lost scores

Non-Functional

Scalability -- Handle tens of millions of active players and hundreds of thousands of score updates per second during peak events
Latency -- Serve global rank queries in under 50ms and K-neighbor queries in under 100ms at p99
Availability -- Maintain 99.9% uptime with graceful degradation; serve slightly stale leaderboard data rather than returning errors
Consistency -- Eventual consistency with a staleness bound of a few seconds for rank updates; stronger consistency for score writes to prevent data loss

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Sharded Sorted Set Architecture

A single Redis sorted set cannot hold tens of millions of entries without becoming a performance bottleneck. Interviewers want to see how you partition the score space across multiple shards and compute globally accurate ranks from distributed data.

Hints to consider:

Partition by score range so each shard holds a contiguous slice of the leaderboard, enabling rank calculation by summing counts from higher-scoring shards
Use Redis ZADD for O(log N) inserts and ZREVRANK for local rank lookups within each shard
Handle shard boundary crossings when a player's score moves from one range to another by issuing ZREM on the old shard and ZADD on the new one
Monitor shard sizes and dynamically split overloaded shards by narrowing their score ranges

2. High Write Throughput Under Bursty Load

Tournament finals and season launches can spike write traffic by 10x or more. The ingestion pipeline must absorb bursts without dropping updates or creating backpressure that stalls game servers.

Hints to consider:

Buffer score updates through Kafka to decouple game servers from leaderboard writes, enabling replay and backpressure management
Use log compaction by player ID so rapid successive updates from the same player collapse to the latest score
Deploy multiple consumer instances reading from partitioned Kafka topics to parallelize writes across Redis shards
Apply idempotency keys on score submissions to prevent duplicate processing after retries

3. Efficient K-Neighbor and Friend Leaderboard Queries

Players expect to see their local neighborhood in the global ranking instantly. Friend leaderboards require fetching scores for potentially hundreds of connections and returning a sorted result with low latency.

Hints to consider:

For K-neighbors, locate the player's shard using ZREVRANK and fetch entries above and below using ZREVRANGE with offsets, spilling to adjacent shards at boundaries
For friend leaderboards, batch-fetch scores for all friends using MGET or ZSCORE across shards, sort server-side, and cache the result with a short TTL
Invalidate friend leaderboard caches when any friend's score changes, using a Kafka consumer that tracks friend graph edges
Limit friend list sizes and use pagination to keep response payloads and compute costs bounded

4. Time-Windowed Resets and Historical Preservation

Weekly and seasonal leaderboards must reset cleanly while preserving the old rankings for display and analytics. Resetting millions of entries simultaneously can cause service disruption if not handled carefully.

Hints to consider:

Create new sorted set keyspaces for each time window (keyed by week or season identifier) so resets are simply a matter of switching which keyspace is active
Snapshot outgoing leaderboard state to Cassandra or a similar durable store before expiring old Redis keys
Use a background job to migrate historical data to cold storage, keeping only the active and most recent window in Redis
Stagger reset operations across shards to avoid a thundering herd of writes at the exact reset boundary

Suggested Approach

Step 1: Clarify Requirements

Confirm the number of active players and expected score update throughput at peak. Ask whether the leaderboard is single-game or spans multiple game modes, each with independent rankings. Clarify tie-breaking rules (higher score wins, then earliest timestamp). Ask about friend list size limits and whether friend leaderboards need real-time freshness or can tolerate short staleness. Confirm which time windows are required (daily, weekly, seasonal) and whether historical snapshots must be queryable or just archived.

Step 2: High-Level Architecture

Game servers submit score updates to a Score Ingestion Service that publishes events to Kafka for durability and ordering. A Leaderboard Update Service consumes from Kafka and writes to a sharded Redis cluster, where each shard uses a sorted set covering a defined score range. An API layer serves read queries: top-N fetches from the highest shards, individual rank lookups fan out across shards to compute global position, and K-neighbor queries fetch a window around the player's score. A Cassandra cluster stores player profiles, historical snapshots, and audit data. Friend leaderboards are computed on-demand by batch-fetching friend scores from Redis, sorting server-side, and caching briefly. A periodic snapshot job preserves leaderboard state for resets and historical queries.

Step 3: Deep Dive on Rank Computation Across Shards

Walk through how global rank is calculated from sharded sorted sets. Divide the score space into S shards (for example, 50 shards each covering a 2000-point range). When a score update arrives, determine the target shard from the score value, issue ZADD to insert or update the player, and ZREM from the previous shard if the score crossed a boundary. To compute global rank: find the player's local rank within their shard via ZREVRANK, then query each higher-scoring shard for its ZCARD (total member count), and sum those counts plus the local rank. With 50 shards, this fan-out completes in single-digit milliseconds using pipelined Redis commands. For top-N, start from the highest shard and pull entries with ZREVRANGE until N results are collected, spilling to the next lower shard as needed.

Step 4: Address Secondary Concerns

Cover fault tolerance by running Redis replicas with automatic failover; during failover, serve stale reads from replicas while the new primary warms up. Discuss monitoring: track leaderboard update lag (time from Kafka event to Redis write), query latency percentiles across shards, and shard balance metrics to trigger dynamic rebalancing. Address cost by tiering storage: keep active leaderboard windows in Redis, snapshot completed windows to Cassandra, and archive older data to object storage. For abuse prevention, validate score submissions server-side and flag statistically anomalous jumps for review before updating the leaderboard.

Related Learning

Deepen your understanding of the patterns used in this problem:

Leaderboard — Walk through a complete reference architecture for real-time leaderboards with sorted sets and sharding
Top-K Videos — Explore techniques for maintaining ranked lists under high write throughput
Caching — Learn how Redis sorted sets power real-time ranking and how to manage cache memory at scale

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Leaderboard guide.

Problem Statement

Key Requirements

Functional

Global top-N rankings -- Display the top 100 or top 1000 players globally, updated in near real-time after each match
Player rank with K-neighbors -- Show any player their exact global rank and the K players immediately above and below them
Friend leaderboard -- Rank a player among their friends, supporting friend lists of up to several hundred connections
Time-windowed leaderboards -- Support daily, weekly, and seasonal leaderboards that reset on schedule and preserve historical snapshots
Score submission pipeline -- Accept score updates from game servers with exactly-once processing semantics to prevent duplicate or lost scores

Non-Functional

Scalability -- Handle tens of millions of active players and hundreds of thousands of score updates per second during peak events
Latency -- Serve global rank queries in under 50ms and K-neighbor queries in under 100ms at p99
Availability -- Maintain 99.9% uptime with graceful degradation; serve slightly stale leaderboard data rather than returning errors
Consistency -- Eventual consistency with a staleness bound of a few seconds for rank updates; stronger consistency for score writes to prevent data loss

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Sharded Sorted Set Architecture

Hints to consider:

Partition by score range so each shard holds a contiguous slice of the leaderboard, enabling rank calculation by summing counts from higher-scoring shards
Use Redis ZADD for O(log N) inserts and ZREVRANK for local rank lookups within each shard
Handle shard boundary crossings when a player's score moves from one range to another by issuing ZREM on the old shard and ZADD on the new one
Monitor shard sizes and dynamically split overloaded shards by narrowing their score ranges

2. High Write Throughput Under Bursty Load

Tournament finals and season launches can spike write traffic by 10x or more. The ingestion pipeline must absorb bursts without dropping updates or creating backpressure that stalls game servers.

Hints to consider:

Buffer score updates through Kafka to decouple game servers from leaderboard writes, enabling replay and backpressure management
Use log compaction by player ID so rapid successive updates from the same player collapse to the latest score
Deploy multiple consumer instances reading from partitioned Kafka topics to parallelize writes across Redis shards
Apply idempotency keys on score submissions to prevent duplicate processing after retries

3. Efficient K-Neighbor and Friend Leaderboard Queries

Hints to consider:

For K-neighbors, locate the player's shard using ZREVRANK and fetch entries above and below using ZREVRANGE with offsets, spilling to adjacent shards at boundaries
For friend leaderboards, batch-fetch scores for all friends using MGET or ZSCORE across shards, sort server-side, and cache the result with a short TTL
Invalidate friend leaderboard caches when any friend's score changes, using a Kafka consumer that tracks friend graph edges
Limit friend list sizes and use pagination to keep response payloads and compute costs bounded

4. Time-Windowed Resets and Historical Preservation

Hints to consider:

Create new sorted set keyspaces for each time window (keyed by week or season identifier) so resets are simply a matter of switching which keyspace is active
Snapshot outgoing leaderboard state to Cassandra or a similar durable store before expiring old Redis keys
Use a background job to migrate historical data to cold storage, keeping only the active and most recent window in Redis
Stagger reset operations across shards to avoid a thundering herd of writes at the exact reset boundary

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Rank Computation Across Shards

Step 4: Address Secondary Concerns

Related Learning

Deepen your understanding of the patterns used in this problem:

Leaderboard — Walk through a complete reference architecture for real-time leaderboards with sorted sets and sharding
Top-K Videos — Explore techniques for maintaining ranked lists under high write throughput
Caching — Learn how Redis sorted sets power real-time ranking and how to manage cache memory at scale