Practice/Meta/Design CoderPad
Design CoderPad
System DesignMust
Problem Statement
Design a web-based collaborative coding platform that enables multiple users to simultaneously edit and execute code in a shared environment, similar to tools used for conducting technical interviews remotely. The system should support real-time synchronization of code changes across all participants, provide syntax highlighting for popular programming languages, execute code securely in isolated environments, and stream execution results back to all viewers instantly.
The platform must handle sessions where typically 2-5 users collaborate on a single document, with occasional spikes to thousands of concurrent sessions during peak interview hours. Code execution should complete within 10 seconds for typical workloads, while editor updates must propagate to all participants within 100-200ms to maintain the illusion of instantaneous collaboration. The system needs to preserve session history for playback and review, while preventing malicious code from escaping sandboxes or consuming excessive resources.
Key Requirements
Functional
- Session management -- Users can create coding sessions with unique URLs, invite participants, and control access permissions
- Collaborative editing -- Multiple users can simultaneously edit the same document with live cursor positions, user presence indicators, and conflict-free merging
- Multi-language execution -- Support running code in 10+ languages (Python, JavaScript, Java, C++, etc.) with standard input/output and error streams
- Real-time output streaming -- Execution results appear incrementally as the program runs, not just after completion
- Session persistence -- Save snapshots of code at various points and enable replay of the editing and execution history
Non-Functional
- Scalability -- Support 10,000+ concurrent coding sessions with 2-5 participants each during peak hours
- Reliability -- 99.9% uptime for editing; graceful degradation if code execution infrastructure fails
- Latency -- Editor synchronization under 200ms P99; code execution starts within 2 seconds of submission
- Consistency -- All participants see identical document state; execution results are deterministic and consistently ordered
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Real-Time Collaboration Architecture
How you handle concurrent edits from multiple users without conflicts is central to this design. Interviewers want to see you understand operational transformation (OT) or conflict-free replicated data types (CRDTs), not just rely on naive last-write-wins.
Hints to consider:
- Use WebSocket connections for bidirectional, low-latency communication between clients and a coordination server
- Implement a server-authoritative model where a session server sequences operations and broadcasts them to all participants
- Consider operational transformation to adjust cursor positions and edit indices as concurrent changes arrive
- Track vector clocks or logical timestamps to order operations consistently across disconnected clients
2. Secure Code Execution Sandbox
Running untrusted code safely is non-negotiable. Interviewers expect you to discuss multi-layered isolation, resource limits, and how to prevent both malicious attacks and accidental resource exhaustion.
Hints to consider:
- Use containerization (Docker) or lightweight VMs (Firecracker, gVisor) to isolate each execution in its own environment
- Apply cgroups to limit CPU, memory, and disk I/O per execution; enforce wall-clock timeouts (e.g., 30 seconds max)
- Block outbound network access except to approved endpoints; use seccomp profiles to restrict dangerous system calls
- Queue execution requests through a worker pool to prevent thundering herd problems and enable rate limiting per session
3. Handling Latency and Failures
Network partitions, slow clients, and crashed execution workers are inevitable. Interviewers look for strategies to maintain a smooth user experience despite these failures.
Hints to consider:
- Implement optimistic updates in the editor client so typing feels instant, with server reconciliation in the background
- Use heartbeats and exponential backoff reconnection logic when WebSocket connections drop
- Persist execution requests to a durable queue (Kafka or database) so they survive worker crashes and can be retried
- Stream partial output back to clients as it's generated, using chunked transfer or server-sent events, so users see progress even for long-running jobs
4. Scaling the Coordination Layer
As concurrent sessions grow, a single coordination server becomes a bottleneck. Discuss how to shard session state and fan out updates efficiently.
Hints to consider:
- Shard sessions across multiple stateful coordination servers, routing clients via consistent hashing on session ID
- Use Redis pub/sub or a dedicated message bus for broadcasting edits within a session, avoiding N² direct connections
- Cache hot session state (document content, participant list) in memory with TTLs, backed by Postgres for durable snapshots
- Employ a load balancer with session affinity (sticky sessions) initially, but design for eventual migration to a stateless model with shared state in Redis
Suggested Approach
Step 1: Clarify Requirements
Start by confirming the scope and priorities with your interviewer:
- How many concurrent sessions and participants per session do we need to support?
- Which programming languages must we support for execution, and are there any special requirements (e.g., graphical output, package installation)?
- What latency targets matter most: editor sync, execution start time, or output streaming?
- Do we need features like chat, video, or code review annotations, or focus purely on editing and execution?
- Should session history support full replay (every keystroke) or just periodic snapshots?