Practice/Google/Design Google Docs
Design Google Docs
System DesignMust
Problem Statement
Design a collaborative document editing platform where multiple users can simultaneously view and edit the same document in real time. Changes made by one user should appear on every other collaborator's screen within a fraction of a second, and concurrent edits to the same paragraph should be merged automatically without data loss.
The fundamental difficulty is conflict resolution: when two users type in the same region of a document at the same instant, the system must deterministically reconcile their edits so that all clients converge to an identical document state. Naive approaches like last-write-wins destroy content. You need a principled concurrency control mechanism — such as Operational Transformation or Conflict-Free Replicated Data Types — to merge edits correctly.
Beyond real-time editing, the system must support version history (allowing users to browse and restore past snapshots), document-level access control (view, comment, edit permissions), full-text search across a user's documents, and offline editing that syncs when connectivity resumes.
Key Requirements
Functional
- Real-time collaborative editing -- Multiple users can edit the same document simultaneously with changes reflected across all clients within 200ms
- Version history -- Users can browse a timeline of document revisions and restore any previous version
- Access control -- Document owners can share with specific users or groups at view, comment, or edit permission levels
- Full-text search -- Users can search across all documents they have access to, with results filtered by their permissions
Non-Functional
- Scalability -- Support hundreds of millions of documents with up to 100 concurrent editors per document
- Latency -- Local edits should render immediately; remote edits should appear within 200ms for users in the same region
- Consistency -- All collaborators must converge to the same document state, even under concurrent edits and network partitions
- Durability -- No committed edit should ever be lost, even during server crashes or failovers
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Concurrency Control Mechanism
This is the heart of the problem. Interviewers want to understand how you handle simultaneous edits without losing content.
Hints to consider:
- What are the tradeoffs between Operational Transformation (OT) and CRDTs for this use case?
- How does the server act as a central sequencer to establish a total order of operations in an OT-based system?
- What happens when a client sends an operation that was based on a stale document version?
- How do you handle operations that affect overlapping character ranges?
2. Connection Management and Presence
Maintaining persistent connections to potentially millions of active editing sessions requires careful resource management.
Hints to consider:
- Why are WebSockets preferred over HTTP polling for this use case, and how do you scale them?
- How do you route a user's WebSocket connection to the server that holds the active session for their document?
- How do you detect and display collaborator presence (cursors, selections) without flooding the network?
- What happens when a WebSocket server crashes — how do clients reconnect and catch up on missed operations?
3. Hot Documents
A small number of documents (company-wide announcements, shared templates) attract disproportionate editing traffic.
Hints to consider:
- How do you detect that a document is becoming a hot spot?
- Can you shard a single document's operation log across multiple servers, and what consistency challenges does that create?
- Would batching or throttling operation broadcasts help reduce fan-out for documents with many viewers?
- How do you balance memory usage when some documents have thousands of connected clients?