Practice/Anthropic/Batched Model Runs
Batched Model Runs
System DesignOptional
Problem Statement
Design a system that allows multiple users to simultaneously edit a shared spreadsheet in real time, similar to Google Sheets or Excel Online. When one user types a value, applies formatting, or modifies a formula in a cell, all other active collaborators should see the change within a few hundred milliseconds. The system must handle documents with up to 100,000 cells, support up to 50 concurrent editors per spreadsheet, and maintain data consistency even when users edit the same cell simultaneously. Unlike simple text documents, spreadsheets have complex interdependencies through formulas and references, so changes can cascade across multiple cells. The system must recalculate affected formulas efficiently and propagate updates without overwhelming clients or creating infinite loops.
Your design should address real-world challenges including network partitions, conflicting edits, the tension between immediate local updates and eventual consistency, and how to minimize data loss when connections drop. Interviewers want to see how you reason about operational transformation or conflict-free replicated data types (CRDTs), efficient change propagation, formula dependency graphs, and strategies for maintaining responsiveness under high collaboration load.
Key Requirements
Functional
- Collaborative Editing -- multiple users can edit different cells or the same cell simultaneously, with changes visible to all participants in near real-time
- Formula Calculation -- cells can contain formulas that reference other cells, and the system must recalculate dependent cells when inputs change
- Edit History and Undo -- users should be able to view recent changes, see who made them, and undo or redo operations
- Presence and Cursors -- display active collaborators and show where each user is currently editing with live cursors and highlights
- Offline Support -- allow users to continue editing when disconnected and merge changes intelligently when they reconnect
Non-Functional
- Scalability -- handle 10,000+ active spreadsheets with 50 concurrent editors per sheet and millions of registered users
- Reliability -- ensure no data loss during server failures, network partitions, or client crashes; durably persist all edits
- Latency -- propagate simple cell edits to all collaborators within 200-500ms; recalculate and update dependent cells within 1-2 seconds for moderately complex formulas
- Consistency -- guarantee eventual consistency across all clients; resolve conflicting edits deterministically so all users converge to the same state
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Conflict Resolution and Consistency Model
When two users edit the same cell simultaneously or perform operations that conflict (like inserting rows at the same position), the system must resolve conflicts deterministically. Interviewers want to see if you understand operational transformation (OT) versus CRDTs, last-write-wins policies, vector clocks, or more sophisticated approaches.
Hints to consider:
- Discuss whether you'll use operational transformation to transform concurrent operations relative to each other, or a CRDT-based approach where operations commute
- Consider assigning logical timestamps or version vectors to edits so you can order them consistently across all clients
- Think about how to handle structural changes (insert/delete rows or columns) versus content changes (cell value updates), as they have different conflict characteristics
- Address the tradeoff between strong consistency (serializable edits, higher latency) and eventual consistency (immediate local updates, potential conflicts to resolve)
2. Formula Dependency Calculation and Propagation
Spreadsheet formulas create directed acyclic graphs (DAGs) of dependencies. When a cell changes, the system must identify all dependent cells, recalculate them in topological order, and propagate updates efficiently. Interviewers probe whether you can design a scalable calculation engine.
Hints to consider:
- Build a dependency graph where each cell tracks which cells it references and which cells reference it, updated incrementally as formulas change
- Use topological sort or iterative recalculation starting from changed cells and propagating through dependents level by level
- Detect circular references (cycles in the dependency graph) and either reject them or handle them with iteration limits
- Consider caching calculation results and invalidating only affected branches of the graph to avoid recalculating the entire sheet on every edit
- Discuss offloading heavy calculations to background workers if the dependency chain is long or formulas are compute-intensive
3. Real-Time Communication Architecture
The system must push updates to all active collaborators with minimal latency. Interviewers want to know if you'll use WebSockets, Server-Sent Events, polling, or a pub-sub system, and how you'll scale the fan-out of changes to many clients.
Hints to consider:
- Use WebSocket connections for bidirectional, low-latency communication between each client and the server
- Consider a pub-sub messaging layer (like Redis Pub/Sub or Kafka) where each spreadsheet is a topic, allowing multiple application servers to push updates to subscribed clients
- Implement presence tracking with heartbeats and ephemeral session data so clients know who else is active
- Design for horizontal scalability by routing all editors of a given spreadsheet to the same server (sticky sessions or consistent hashing), or use a shared state layer so any server can handle any user
- Address how to handle slow or disconnected clients without blocking other collaborators, possibly with client-side buffering and exponential backoff
4. Data Persistence and Durability
Every edit must be durably stored to prevent data loss. Interviewers explore how you'll persist changes, maintain version history, and recover from failures without blocking real-time collaboration.
Hints to consider:
- Store the current state of the spreadsheet in a database (document store or relational) and write an append-only log of all operations for history and recovery
- Use write-ahead logging (WAL) or an event sourcing pattern where operations are written to a durable queue (Kafka or DynamoDB Streams) before being applied in memory
- Consider snapshotting the full spreadsheet state periodically to avoid replaying thousands of operations on server restart
- Discuss how you'll handle concurrent writes to the database, possibly with optimistic locking, version numbers, or conditional writes
- Address backup and disaster recovery, ensuring no data loss even if primary storage fails
Suggested Approach
Step 1: Clarify Requirements
Confirm the following with the interviewer before designing:
- Spreadsheet size limits: Maximum rows, columns, and total cells per sheet? Are we optimizing for small (1K cells) or large (100K+ cells) documents?
- Collaboration scale: How many concurrent editors per sheet? Are there read-only viewers, and do they count against concurrency limits?
- Formula complexity: Do formulas reference other sheets or external data sources? Are there built-in functions (SUM, VLOOKUP) or custom scripts?
- Edit granularity: Are edits per-cell, or can users select ranges and apply bulk operations like copy-paste or fill-down?
- Offline and mobile support: Must users work offline and sync later? Does the system need to run on mobile devices with intermittent connectivity?
- Security and permissions: Are there per-user or per-cell permissions? Do you need to audit who changed what?
- Latency and consistency priorities: Is it acceptable to show stale data briefly, or must all users always see identical state? What's the target propagation latency?