Practice/Amazon/Design Google Docs
Design Google Docs
System DesignMust
Problem Statement
Design a collaborative document editing service similar to Google Docs where multiple users can create, edit, and share documents in real time with concurrent editing capabilities. Users see each other's cursors, changes merge automatically without data loss, and access is controlled through sharing and permissions.
At Amazon, interviewers ask this because it forces you to reason about real-time updates, high-contention writes, conflict resolution (OT/CRDT), stateful connection management with WebSockets, storage and versioning, and search with permissions. Expect to clarify scope, define core SLAs, and drill into collaboration algorithms, transport, sharding, and security.
Key Requirements
Functional
- Real-time collaborative editing -- multiple users can edit the same document simultaneously and see each other's changes within 1-2 seconds
- Conflict-free merge -- concurrent edits to different parts merge automatically without data loss; edits to the same region resolve deterministically
- Version history -- users can view document history and revert to previous versions
- Sharing and permissions -- users share documents with view, comment, or edit access and manage collaborator lists
Non-Functional
- Scalability -- support millions of active documents with up to 50 simultaneous editors per document
- Reliability -- zero data loss even during network partitions or server failures; automatic recovery from crashes
- Latency -- local edits appear instantly; remote changes propagate within 1-2 seconds under normal network conditions
- Consistency -- all users eventually converge to the same document state (strong eventual consistency); metadata operations require strong consistency
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Conflict Resolution Algorithm
This is the heart of collaborative editing. Interviewers want to see if you understand how to merge concurrent edits deterministically. Simply saying "we'll use OT" is not enough -- you need to explain the mechanics.
Hints to consider:
- Operational Transformation (OT) transforms operations based on concurrent edits, requiring a central server to establish canonical order
- Conflict-Free Replicated Data Types (CRDTs) like Yjs or Automerge guarantee convergence through commutative operations without central coordination
- Discuss the tradeoffs: OT has lower metadata overhead but complex transformation functions; CRDTs are simpler to reason about but grow metadata linearly
- Show how you handle concrete scenarios: two users typing at the same position, one user deleting text while another inserts within it
2. Real-Time Communication Architecture
Collaborative editing requires bidirectional, low-latency communication. Interviewers expect you to design a scalable WebSocket infrastructure.
Hints to consider:
- Use WebSocket connections between clients and collaboration servers for bidirectional, low-latency operation delivery
- Route all editors of the same document to the same server or cluster partition for efficient broadcasting
- Separate the connection layer (stateful WebSocket servers) from business logic (stateless API servers)
- Use Redis pub/sub to fan out operations when a document's editors span multiple connection servers
3. Storage and Version History
Every edit must be durable, and users expect to browse and restore past versions. Interviewers probe your data model and compaction strategy.
Hints to consider:
- Use an event-sourcing approach: append every operation to an immutable log, with periodic snapshots for fast document loading
- Store snapshots in object storage and operation logs in a database, with snapshot boundaries as checkpoints
- On document open, load the latest snapshot and replay subsequent operations to reconstruct current state
- Implement garbage collection of old operations while preserving snapshot boundaries for version history
4. Permission-Aware Search
Users need to search across their documents. Interviewers want to see how you index content while respecting access controls.
Hints to consider:
- Index document content in Elasticsearch with ACL metadata attached to each document
- Use change data capture from the metadata database to keep search indexes updated as documents are edited
- Filter search results by the querying user's permissions before returning results to prevent data leakage
- Propagate permission changes to the search index promptly to prevent stale access