For a full example answer with detailed architecture diagrams and deep dives, see our Leaderboard guide.
Design a real-time gaming leaderboard system for a competitive online game with millions of active players. The system must display global top rankings, each player's current rank along with nearby competitors, and leaderboards scoped to a player's friend list. Rankings should update within seconds of a match completing.
The fundamental difficulty is sustaining extremely high write throughput from concurrent game completions while simultaneously serving low-latency read queries for global rankings, K-neighbor lookups, and friend-scoped views. A single sorted structure quickly becomes a bottleneck at this scale, so you need a sharding strategy that distributes writes evenly, supports efficient rank computation across shards, and handles temporal windows such as daily, weekly, and seasonal resets without downtime or data loss.
Consider that popular games experience massive traffic spikes during tournaments and season launches, where both write and read volumes can surge by an order of magnitude. The system must remain responsive during these peaks and degrade gracefully if individual components fail.
Based on real interview experiences, these are the areas interviewers probe most deeply:
A single Redis sorted set cannot hold tens of millions of entries without becoming a performance bottleneck. Interviewers want to see how you partition the score space across multiple shards and compute globally accurate ranks from distributed data.
Hints to consider:
Tournament finals and season launches can spike write traffic by 10x or more. The ingestion pipeline must absorb bursts without dropping updates or creating backpressure that stalls game servers.
Hints to consider:
Players expect to see their local neighborhood in the global ranking instantly. Friend leaderboards require fetching scores for potentially hundreds of connections and returning a sorted result with low latency.
Hints to consider:
Weekly and seasonal leaderboards must reset cleanly while preserving the old rankings for display and analytics. Resetting millions of entries simultaneously can cause service disruption if not handled carefully.
Hints to consider:
Confirm the number of active players and expected score update throughput at peak. Ask whether the leaderboard is single-game or spans multiple game modes, each with independent rankings. Clarify tie-breaking rules (higher score wins, then earliest timestamp). Ask about friend list size limits and whether friend leaderboards need real-time freshness or can tolerate short staleness. Confirm which time windows are required (daily, weekly, seasonal) and whether historical snapshots must be queryable or just archived.
Game servers submit score updates to a Score Ingestion Service that publishes events to Kafka for durability and ordering. A Leaderboard Update Service consumes from Kafka and writes to a sharded Redis cluster, where each shard uses a sorted set covering a defined score range. An API layer serves read queries: top-N fetches from the highest shards, individual rank lookups fan out across shards to compute global position, and K-neighbor queries fetch a window around the player's score. A Cassandra cluster stores player profiles, historical snapshots, and audit data. Friend leaderboards are computed on-demand by batch-fetching friend scores from Redis, sorting server-side, and caching briefly. A periodic snapshot job preserves leaderboard state for resets and historical queries.
Walk through how global rank is calculated from sharded sorted sets. Divide the score space into S shards (for example, 50 shards each covering a 2000-point range). When a score update arrives, determine the target shard from the score value, issue ZADD to insert or update the player, and ZREM from the previous shard if the score crossed a boundary. To compute global rank: find the player's local rank within their shard via ZREVRANK, then query each higher-scoring shard for its ZCARD (total member count), and sum those counts plus the local rank. With 50 shards, this fan-out completes in single-digit milliseconds using pipelined Redis commands. For top-N, start from the highest shard and pull entries with ZREVRANGE until N results are collected, spilling to the next lower shard as needed.
Cover fault tolerance by running Redis replicas with automatic failover; during failover, serve stale reads from replicas while the new primary warms up. Discuss monitoring: track leaderboard update lag (time from Kafka event to Redis write), query latency percentiles across shards, and shard balance metrics to trigger dynamic rebalancing. Address cost by tiering storage: keep active leaderboard windows in Redis, snapshot completed windows to Cassandra, and archive older data to object storage. For abuse prevention, validate score submissions server-side and flag statistically anomalous jumps for review before updating the leaderboard.
Deepen your understanding of the patterns used in this problem: