Practice/Meta/Design CoderPad

Design CoderPad

System DesignMust

Problem Statement

Design a web-based collaborative coding platform that enables multiple users to simultaneously edit and execute code in a shared environment, similar to tools used for conducting technical interviews remotely. The system should support real-time synchronization of code changes across all participants, provide syntax highlighting for popular programming languages, execute code securely in isolated environments, and stream execution results back to all viewers instantly.

The platform must handle sessions where typically 2-5 users collaborate on a single document, with occasional spikes to thousands of concurrent sessions during peak interview hours. Code execution should complete within 10 seconds for typical workloads, while editor updates must propagate to all participants within 100-200ms to maintain the illusion of instantaneous collaboration. The system needs to preserve session history for playback and review, while preventing malicious code from escaping sandboxes or consuming excessive resources.

Key Requirements

Functional

Session management -- Users can create coding sessions with unique URLs, invite participants, and control access permissions
Collaborative editing -- Multiple users can simultaneously edit the same document with live cursor positions, user presence indicators, and conflict-free merging
Multi-language execution -- Support running code in 10+ languages (Python, JavaScript, Java, C++, etc.) with standard input/output and error streams
Real-time output streaming -- Execution results appear incrementally as the program runs, not just after completion
Session persistence -- Save snapshots of code at various points and enable replay of the editing and execution history

Non-Functional

Scalability -- Support 10,000+ concurrent coding sessions with 2-5 participants each during peak hours
Reliability -- 99.9% uptime for editing; graceful degradation if code execution infrastructure fails
Latency -- Editor synchronization under 200ms P99; code execution starts within 2 seconds of submission
Consistency -- All participants see identical document state; execution results are deterministic and consistently ordered

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Real-Time Collaboration Architecture

How you handle concurrent edits from multiple users without conflicts is central to this design. Interviewers want to see you understand operational transformation (OT) or conflict-free replicated data types (CRDTs), not just rely on naive last-write-wins.

Hints to consider:

Use WebSocket connections for bidirectional, low-latency communication between clients and a coordination server
Implement a server-authoritative model where a session server sequences operations and broadcasts them to all participants
Consider operational transformation to adjust cursor positions and edit indices as concurrent changes arrive
Track vector clocks or logical timestamps to order operations consistently across disconnected clients

2. Secure Code Execution Sandbox

Running untrusted code safely is non-negotiable. Interviewers expect you to discuss multi-layered isolation, resource limits, and how to prevent both malicious attacks and accidental resource exhaustion.

Hints to consider:

Use containerization (Docker) or lightweight VMs (Firecracker, gVisor) to isolate each execution in its own environment
Apply cgroups to limit CPU, memory, and disk I/O per execution; enforce wall-clock timeouts (e.g., 30 seconds max)
Block outbound network access except to approved endpoints; use seccomp profiles to restrict dangerous system calls
Queue execution requests through a worker pool to prevent thundering herd problems and enable rate limiting per session

3. Handling Latency and Failures

Network partitions, slow clients, and crashed execution workers are inevitable. Interviewers look for strategies to maintain a smooth user experience despite these failures.

Hints to consider:

Implement optimistic updates in the editor client so typing feels instant, with server reconciliation in the background
Use heartbeats and exponential backoff reconnection logic when WebSocket connections drop
Persist execution requests to a durable queue (Kafka or database) so they survive worker crashes and can be retried
Stream partial output back to clients as it's generated, using chunked transfer or server-sent events, so users see progress even for long-running jobs

4. Scaling the Coordination Layer

As concurrent sessions grow, a single coordination server becomes a bottleneck. Discuss how to shard session state and fan out updates efficiently.

Hints to consider:

Shard sessions across multiple stateful coordination servers, routing clients via consistent hashing on session ID
Use Redis pub/sub or a dedicated message bus for broadcasting edits within a session, avoiding N² direct connections
Cache hot session state (document content, participant list) in memory with TTLs, backed by Postgres for durable snapshots
Employ a load balancer with session affinity (sticky sessions) initially, but design for eventual migration to a stateless model with shared state in Redis

Suggested Approach

Step 1: Clarify Requirements

Start by confirming the scope and priorities with your interviewer:

How many concurrent sessions and participants per session do we need to support?
Which programming languages must we support for execution, and are there any special requirements (e.g., graphical output, package installation)?
What latency targets matter most: editor sync, execution start time, or output streaming?
Do we need features like chat, video, or code review annotations, or focus purely on editing and execution?
Should session history support full replay (every keystroke) or just periodic snapshots?

Practice/Meta/Design CoderPad

Design CoderPad

System DesignMust

Problem Statement

Key Requirements

Functional

Session management -- Users can create coding sessions with unique URLs, invite participants, and control access permissions
Collaborative editing -- Multiple users can simultaneously edit the same document with live cursor positions, user presence indicators, and conflict-free merging
Multi-language execution -- Support running code in 10+ languages (Python, JavaScript, Java, C++, etc.) with standard input/output and error streams
Real-time output streaming -- Execution results appear incrementally as the program runs, not just after completion
Session persistence -- Save snapshots of code at various points and enable replay of the editing and execution history

Non-Functional

Scalability -- Support 10,000+ concurrent coding sessions with 2-5 participants each during peak hours
Reliability -- 99.9% uptime for editing; graceful degradation if code execution infrastructure fails
Latency -- Editor synchronization under 200ms P99; code execution starts within 2 seconds of submission
Consistency -- All participants see identical document state; execution results are deterministic and consistently ordered

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Real-Time Collaboration Architecture

Hints to consider:

Use WebSocket connections for bidirectional, low-latency communication between clients and a coordination server
Implement a server-authoritative model where a session server sequences operations and broadcasts them to all participants
Consider operational transformation to adjust cursor positions and edit indices as concurrent changes arrive
Track vector clocks or logical timestamps to order operations consistently across disconnected clients

2. Secure Code Execution Sandbox

Hints to consider:

Use containerization (Docker) or lightweight VMs (Firecracker, gVisor) to isolate each execution in its own environment
Apply cgroups to limit CPU, memory, and disk I/O per execution; enforce wall-clock timeouts (e.g., 30 seconds max)
Block outbound network access except to approved endpoints; use seccomp profiles to restrict dangerous system calls
Queue execution requests through a worker pool to prevent thundering herd problems and enable rate limiting per session

3. Handling Latency and Failures

Network partitions, slow clients, and crashed execution workers are inevitable. Interviewers look for strategies to maintain a smooth user experience despite these failures.

Hints to consider:

Implement optimistic updates in the editor client so typing feels instant, with server reconciliation in the background
Use heartbeats and exponential backoff reconnection logic when WebSocket connections drop
Persist execution requests to a durable queue (Kafka or database) so they survive worker crashes and can be retried
Stream partial output back to clients as it's generated, using chunked transfer or server-sent events, so users see progress even for long-running jobs

4. Scaling the Coordination Layer

As concurrent sessions grow, a single coordination server becomes a bottleneck. Discuss how to shard session state and fan out updates efficiently.

Hints to consider:

Shard sessions across multiple stateful coordination servers, routing clients via consistent hashing on session ID
Use Redis pub/sub or a dedicated message bus for broadcasting edits within a session, avoiding N² direct connections
Cache hot session state (document content, participant list) in memory with TTLs, backed by Postgres for durable snapshots
Employ a load balancer with session affinity (sticky sessions) initially, but design for eventual migration to a stateless model with shared state in Redis

Suggested Approach

Step 1: Clarify Requirements

Start by confirming the scope and priorities with your interviewer:

How many concurrent sessions and participants per session do we need to support?
Which programming languages must we support for execution, and are there any special requirements (e.g., graphical output, package installation)?
What latency targets matter most: editor sync, execution start time, or output streaming?
Do we need features like chat, video, or code review annotations, or focus purely on editing and execution?
Should session history support full replay (every keystroke) or just periodic snapshots?