Practice/Meta/Design DropBox
Design DropBox
Product DesignMust
Problem Statement
Design a web-based spreadsheet application similar to Google Sheets that enables multiple users to simultaneously edit spreadsheets in real-time. The system must handle millions of active users working on documents ranging from small personal budgets to enterprise spreadsheets with hundreds of thousands of cells. Users expect instant visual feedback when others make changes, formula recalculation within seconds, and the ability to see who is editing which cells. The platform should support versioning, rollback capabilities, and offline editing that syncs when connectivity returns.
Your design should accommodate peak usage patterns where thousands of collaborators might work on a single popular spreadsheet simultaneously, while maintaining data consistency and ensuring no edits are lost during network partitions or client crashes.
Key Requirements
Functional
- Real-time collaborative editing -- multiple users can edit the same spreadsheet simultaneously with changes visible to all participants within 100-200ms
- Formula computation engine -- support complex formulas with cell dependencies that recalculate automatically when referenced cells change
- Conflict resolution -- handle concurrent edits to the same cell or overlapping ranges without data loss
- Version history and rollback -- maintain a complete audit trail of changes and allow restoration to previous states
- Offline mode -- enable editing when disconnected and sync changes when connection resumes
- Sharing and permissions -- support view-only, comment, and edit permissions at user and link levels
Non-Functional
- Scalability -- support 10M daily active users, spreadsheets up to 5M cells, and 1000+ concurrent editors per document
- Reliability -- 99.95% uptime with no data loss, even during partial system failures
- Latency -- cell updates appear to all collaborators within 200ms (p95), formula recalculation completes within 2 seconds for typical spreadsheets
- Consistency -- eventual consistency for non-conflicting edits, strong consistency for conflict resolution to ensure deterministic outcomes
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Operational Transformation and Conflict Resolution
How you handle concurrent edits to the same cells is the defining challenge of collaborative editing systems. Interviewers want to see if you understand that simple last-write-wins creates a poor user experience and that naive locking prevents true collaboration.
Hints to consider:
- Explore Operational Transformation (OT) or Conflict-free Replicated Data Types (CRDTs) for merging concurrent operations
- Consider how cell operations can be transformed when they conflict (e.g., two users typing in the same cell simultaneously)
- Discuss how to maintain causality and ensure all clients converge to the same state despite different operation arrival orders
- Think about representing edits as operations with metadata rather than state snapshots
2. Real-Time Synchronization Architecture
The interviewer expects you to design a system that balances real-time performance with scalability. Broadcasting every keystroke to every collaborator doesn't scale, but batching too aggressively creates a laggy experience.
Hints to consider:
- WebSocket connections for bidirectional real-time communication vs. HTTP long-polling tradeoffs
- Consider operational batching strategies that group rapid edits while maintaining low perceived latency
- Discuss whether to route all changes through a central server or enable peer-to-peer synchronization
- Think about presence indicators (showing where other users are actively editing) as a separate concern from data synchronization
3. Formula Dependency Graph and Recalculation
Spreadsheets with complex formulas create dependency graphs that must be efficiently computed. Interviewers probe whether you understand how to avoid recalculating the entire sheet and how to handle circular dependencies.
Hints to consider:
- Build a directed acyclic graph (DAG) of cell dependencies to determine calculation order
- Use topological sorting to identify which cells need recalculation when a source cell changes
- Consider lazy evaluation for cells not currently visible in the viewport
- Discuss strategies for detecting and breaking circular references, and how to communicate these errors to users
4. Storage and Versioning Strategy
The interviewer wants to understand your approach to persisting spreadsheet data efficiently while maintaining a complete version history without excessive storage costs.
Hints to consider:
- Consider storing operational logs (event sourcing) rather than snapshots for granular version history
- Discuss periodic snapshot creation for faster document loading and to compact operation logs
- Think about representing sparse spreadsheets efficiently (most cells are empty)
- Explore how to store large spreadsheets across multiple storage nodes while maintaining performance
5. Offline Mode and Sync Conflicts
Enabling offline editing introduces complex scenarios where the same document diverges on multiple devices. Interviewers assess whether you can design a sync strategy that resolves conflicts intelligently.
Hints to consider:
- Consider vector clocks or hybrid logical clocks to track causality across offline edits
- Discuss three-way merge strategies using the last common ancestor state
- Think about how to handle non-commutative operations (e.g., delete row followed by edit cell in that row)
- Explore user experience for presenting conflicts that cannot be automatically resolved
Suggested Approach
Step 1: Clarify Requirements
Begin by establishing scope and constraints with your interviewer:
- What is the expected document size (number of cells, sheets per document)?
- How many concurrent editors per document should we optimize for?
- What types of formulas must be supported (basic arithmetic vs. complex functions like VLOOKUP)?
- Are we supporting rich formatting, charts, and pivot tables, or focusing on core spreadsheet functionality?
- What are the offline editing requirements -- view-only or full edit capabilities?
- Should we support import/export of Excel files, or is this a standalone format?
- What is the expected read-to-write ratio for typical usage patterns?