Practice/MongoDB/Design Multi-player Figma

Design Multi-player Figma

System DesignMust

Problem Statement

Design a collaborative design platform like Figma that allows multiple users to edit documents simultaneously with real-time updates, conflict resolution, and WebSocket-based synchronization. Users open a shared canvas, manipulate vector shapes, text layers, and component instances, and see every collaborator's cursor, selection, and edits appear within milliseconds. The platform must also support version history, undo/redo, and file sharing with granular permissions.

The core technical challenge is reconciling concurrent edits from geographically distributed users without data loss or visible glitches. Two designers might drag the same rectangle at the same time, resize overlapping frames, or paste content into the same artboard simultaneously. Your system must resolve these conflicts deterministically so all participants converge on an identical document state. Beyond editing, the system stores large binary assets (images, fonts, illustrations) referenced by the design file, and must deliver these efficiently without blocking the real-time operation stream.

At MongoDB, a mid-level candidate was asked to "Design Google Docs," which shares the same real-time collaboration fundamentals. Expect interviewers to probe deeply into your concurrency model, WebSocket scaling approach, and the operational trade-offs between Operational Transformation and CRDTs.

Key Requirements

Functional

Real-time collaborative editing -- multiple users edit the same design file simultaneously and see each other's changes within 100-200ms
Live presence -- each participant sees real-time cursor positions, text selections, active layers, and who is currently viewing the file
Sharing and permissions -- file owners invite collaborators with view-only or edit access, and can revoke permissions at any time
Version history -- users browse a timeline of previous file states, compare snapshots, and restore any earlier version

Non-Functional

Scalability -- support hundreds of thousands of concurrent editing sessions with 2-50 participants each across global regions
Latency -- propagate edit operations to all session participants within 150ms at p95; presence updates (cursors, selections) within 50ms
Consistency -- guarantee eventual convergence across all clients with causal ordering of dependent operations and idempotent application
Reliability -- zero data loss on server failures; automatic reconnection with state reconciliation after network interruptions

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Concurrency Model: OT vs. CRDT

The defining technical decision in this system is how you handle concurrent edits. Interviewers want to see that you understand both Operational Transformation and Conflict-free Replicated Data Types and can reason about the trade-offs.

Hints to consider:

Operational Transformation transforms incoming operations against concurrent ones to maintain consistency, but requires a central server to establish a canonical operation order
CRDTs (like Yjs or Automerge) guarantee convergence through commutative operations and logical timestamps, enabling peer-to-peer and offline-first editing
OT has lower per-operation metadata overhead but introduces server round-trip latency; CRDTs accumulate tombstones that require periodic compaction
For a Figma-like product with structured objects (shapes, frames, text) rather than free-form text, a hybrid approach using per-object last-writer-wins with causal ordering can simplify the concurrency model significantly

2. WebSocket Scaling and Session Routing

Real-time collaboration depends on persistent WebSocket connections with low-latency message routing. Interviewers probe how you partition sessions across servers, handle reconnections, and prevent message loss.

Hints to consider:

Route all participants of a single editing session to the same WebSocket server (or small cluster) using consistent hashing on session ID to minimize cross-server coordination
If participants span multiple gateway servers, use Redis Pub/Sub to fan out operations between servers within the same session
Implement heartbeat mechanisms with exponential backoff reconnection so clients detect disconnections within seconds and recover without losing buffered operations
Plan for graceful draining during deployments: pause accepting new connections, wait for active sessions to close or migrate, then shut down

3. Operation Persistence and Version History

Every edit must be durable for undo/redo, version history, and crash recovery. Interviewers want to see how you balance the real-time hot path with durable storage.

Hints to consider:

Persist every operation to an append-only log (Kafka or DynamoDB Streams) in the order the server applied it, giving you a complete audit trail
Create periodic snapshots of the full document state (every N operations or every T seconds) so recovery does not require replaying the entire history
Store snapshots in object storage (S3) and index them by (file_id, version_number) for fast retrieval during version history browsing
Implement garbage collection that removes old operation log entries after a snapshot covers them, controlling storage growth

4. Large Asset Handling

Design files reference images, icons, and fonts that can be megabytes in size. Interviewers want to see that you keep heavy binary data off the real-time operation channel.

Hints to consider:

Upload assets directly to object storage via presigned URLs and reference them in operations by content-addressed hash, keeping the operation payload small
Serve assets through a CDN with long cache TTLs since content-addressed URLs are immutable
Deduplicate assets across files and teams using content hashing so the same image uploaded by different users is stored only once
Lazy-load assets on the client: fetch only the assets visible in the current viewport and prefetch nearby ones based on scroll position

Practice/MongoDB/Design Multi-player Figma

Design Multi-player Figma

System DesignMust

Problem Statement

Key Requirements

Functional

Real-time collaborative editing -- multiple users edit the same design file simultaneously and see each other's changes within 100-200ms
Live presence -- each participant sees real-time cursor positions, text selections, active layers, and who is currently viewing the file
Sharing and permissions -- file owners invite collaborators with view-only or edit access, and can revoke permissions at any time
Version history -- users browse a timeline of previous file states, compare snapshots, and restore any earlier version

Non-Functional

Scalability -- support hundreds of thousands of concurrent editing sessions with 2-50 participants each across global regions
Latency -- propagate edit operations to all session participants within 150ms at p95; presence updates (cursors, selections) within 50ms
Consistency -- guarantee eventual convergence across all clients with causal ordering of dependent operations and idempotent application
Reliability -- zero data loss on server failures; automatic reconnection with state reconciliation after network interruptions

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Concurrency Model: OT vs. CRDT

Hints to consider:

Operational Transformation transforms incoming operations against concurrent ones to maintain consistency, but requires a central server to establish a canonical operation order
CRDTs (like Yjs or Automerge) guarantee convergence through commutative operations and logical timestamps, enabling peer-to-peer and offline-first editing
OT has lower per-operation metadata overhead but introduces server round-trip latency; CRDTs accumulate tombstones that require periodic compaction
For a Figma-like product with structured objects (shapes, frames, text) rather than free-form text, a hybrid approach using per-object last-writer-wins with causal ordering can simplify the concurrency model significantly

2. WebSocket Scaling and Session Routing

Hints to consider:

Route all participants of a single editing session to the same WebSocket server (or small cluster) using consistent hashing on session ID to minimize cross-server coordination
If participants span multiple gateway servers, use Redis Pub/Sub to fan out operations between servers within the same session
Implement heartbeat mechanisms with exponential backoff reconnection so clients detect disconnections within seconds and recover without losing buffered operations
Plan for graceful draining during deployments: pause accepting new connections, wait for active sessions to close or migrate, then shut down

3. Operation Persistence and Version History

Every edit must be durable for undo/redo, version history, and crash recovery. Interviewers want to see how you balance the real-time hot path with durable storage.

Hints to consider:

Persist every operation to an append-only log (Kafka or DynamoDB Streams) in the order the server applied it, giving you a complete audit trail
Create periodic snapshots of the full document state (every N operations or every T seconds) so recovery does not require replaying the entire history
Store snapshots in object storage (S3) and index them by (file_id, version_number) for fast retrieval during version history browsing
Implement garbage collection that removes old operation log entries after a snapshot covers them, controlling storage growth

4. Large Asset Handling

Design files reference images, icons, and fonts that can be megabytes in size. Interviewers want to see that you keep heavy binary data off the real-time operation channel.

Hints to consider:

Upload assets directly to object storage via presigned URLs and reference them in operations by content-addressed hash, keeping the operation payload small
Serve assets through a CDN with long cache TTLs since content-addressed URLs are immutable
Deduplicate assets across files and teams using content hashing so the same image uploaded by different users is stored only once
Lazy-load assets on the client: fetch only the assets visible in the current viewport and prefetch nearby ones based on scroll position