System Design - Design Chess.com (Online Chess Game)
[ OK ]034c6484-0ff9-4a94-8f63-6a9c5f3bb2bf — full content available
[ INFO ]category: System Design difficulty: unknown freq: first seen: 2026-03-13
[UNKNOWN][SYSTEM DESIGN]New
$catproblem.md
This design prompt focuses on concurrency, low-latency state synchronization, and distributed systems.
Interview Question: Design a Real-Time Multiplayer Chess Platform
1. The Scenario
Design the backend architecture for a global, real-time chess platform (like Chess.com) that supports millions of active players.
2. Core Requirements
Matchmaking: Pair players of similar skill levels (Elo rating) in under 5 seconds.
Game State Management: Handle move validation, clock synchronization, and game persistence.
Real-Time Updates: Moves must be broadcast to both players and spectators with <100ms latency.
Disconnection Handling: Gracefully handle brief internet drops and allow players to reconnect to their active session.
3. Constraint Scaling
10 Million Daily Active Users (DAU).
100,000 Concurrent Games at peak times.
Anti-Cheat: Integration points for an engine-based analysis service.
Key Discussion Areas for the Candidate
A. Communication Protocol
Why WebSockets? Discuss why HTTP polling is insufficient for a blitz game (3-minute clocks) and how to handle sticky sessions behind a load balancer.
B. Data Modeling & Storage
Hot Storage: Where does the "live" game live? (e.g., Redis for active move sequences and clocks).
Cold Storage: How to model the move history (PGN format) in a relational database for match history.
C. The Matchmaking Engine
How to use a "bucket" system to group players by Elo ranges.
How to expand those ranges over time if a match isn't found immediately.
D. Concurrency & Integrity
How to prevent a player from "double-moving" or making a move after their clock has hit zero.
How to handle "Late Arrival" moves (Network Latency Compensation).
E. Distributed Clocks
Discuss the "Server-side Authoritative Clock" vs. "Client-side Display." How do you handle the 200ms lag between a user clicking and the server receiving the packet without cheating?
Bonus "OpenAI Style" Twist
"How would you integrate a Large Language Model to act as a real-time 'Grandmaster Commentator' for top-tier matches, ensuring it has access to the board state and engine evaluations without lagging the game server?"
Would you like me to provide a sample solution or a technical rubric for grading a candidate's response to this?