Design an online multiplayer tic-tac-toe platform where players can create or join game rooms, compete against each other in real time, and track their match history and statistics. Players should be able to share a room link or code with friends for private matches, or opt into random matchmaking to find an opponent instantly. Every move must propagate to both players within milliseconds, and the server must be the authoritative source of truth for game rules, turn enforcement, and win/draw detection.
Although tic-tac-toe is a simple game, the design compresses many production-grade challenges into a small domain: low-latency bidirectional communication, authoritative state management, contention when two actions arrive nearly simultaneously, session and presence tracking, and durable result storage. Interviewers use this problem to test whether you can reason about real-time systems, keep the design proportional to the problem, and prioritize correctness and responsiveness under load.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The core user experience depends on instantaneous move delivery between two players. Interviewers want to see how you push updates with low latency and keep clients in sync with the authoritative server state.
Hints to consider:
Two players can submit moves within milliseconds of each other, creating a race condition. The design must serialize state changes per room to ensure exactly one valid move per turn.
Hints to consider:
Room management involves creation, joining, readiness checks, gameplay, completion, and optional rematch. Interviewers probe how you model this lifecycle cleanly and handle edge cases.
Hints to consider:
Stateful WebSocket connections complicate horizontal scaling. Interviewers want to see how you route and fan out messages correctly when two players in the same room connect to different server instances.
Hints to consider:
Confirm the expected number of concurrent rooms, whether mobile and web clients are both supported, and acceptable latency for move delivery. Ask whether spectator mode is needed or only two-player games. Clarify how long game history should be retained and whether leaderboards or ranking systems are in scope. Verify if the platform should handle abuse prevention such as rate limiting rapid room creation.
Sketch the core components: a WebSocket gateway layer that terminates player connections and routes messages by room, a game logic service that validates moves and manages room state machines, a matchmaking service that pairs waiting players, a fast in-memory store (Redis) holding active room state for quick reads and writes, a relational database (PostgreSQL) for durable player accounts and match history, and a pub/sub layer for cross-server message delivery. Show data flow: player sends move over WebSocket, gateway forwards to game service, game service validates and updates state in Redis, publishes the new board to the pub/sub channel, and the opponent's gateway delivers it.
Walk through a single move in detail. The player's client sends a JSON message containing room_id, player_id, and cell coordinates over their WebSocket connection. The gateway forwards this to the game service, which loads the current board state from Redis. The service checks that it is this player's turn, that the target cell is empty, and that the game is still in progress. If valid, it updates the board, increments the turn counter atomically, checks for a win or draw condition, writes the new state back to Redis, and publishes an event containing the full board to the room's pub/sub channel. Both players receive the authoritative board state. If the move created a terminal condition, the service transitions the room to COMPLETED, writes the result to PostgreSQL, and updates each player's statistics.
Cover disconnection handling: start a grace-period timer when a player disconnects; if they reconnect within the window, restore their session from Redis state; otherwise, forfeit the game. Discuss matchmaking fairness: use a FIFO queue with optional skill-based grouping if ranking is in scope. Address monitoring: track WebSocket connection counts, move validation latency, matchmaking wait time, and room lifecycle durations. Mention security: validate every move server-side to prevent cheating, authenticate WebSocket connections with short-lived tokens, and rate-limit room creation per account. Briefly touch on scaling: partition rooms across WebSocket server instances, auto-scale based on connection count, and use Redis Cluster for the state store as traffic grows.