Practice/Amazon/Design DropBox
Design DropBox
System DesignMust
Problem Statement
Design a file storage and synchronization system like Dropbox that allows users to upload, download, and sync files across multiple devices with real-time updates. Users drop a file in one place and it appears everywhere, even if they go offline and come back later. The system must handle large file transfers reliably, keep metadata consistent across devices, and support sharing with permission controls.
Interviewers ask this because it forces you to separate a control plane (metadata, auth, sync) from a data plane (large file transfers), handle multi-device real-time updates, and design for large blobs, resumable transfers, conflict resolution, and cost-efficient distribution through CDNs. Strong answers demonstrate clear requirements, a scalable architecture, and practical tradeoffs for reliability, consistency, and user experience.
Key Requirements
Functional
- Reliable file upload -- users upload files of any size with progress indication and the ability to pause, resume, and recover from interrupted transfers
- File browsing and download -- users browse, search, and download their files quickly across devices with accurate, up-to-date listings
- Multi-device sync -- files automatically synchronize across all devices, reflecting creates, updates, renames, and deletes, even after being offline
- Sharing and permissions -- users share files or folders with others and control access (view or edit) for collaborators and public links
Non-Functional
- Scalability -- support hundreds of millions of users with billions of files; handle upload bursts during business hours
- Reliability -- no file data loss even during network interruptions or server failures; guarantee eventual delivery of all sync events
- Latency -- file listings load within 300ms; sync notifications propagate to connected devices within 2 seconds
- Consistency -- strong consistency for metadata operations (moves, renames, permission changes); eventual consistency acceptable for sync propagation
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Large File Upload Architecture
Files are frequently multi-GB and uploaded over unreliable networks. Interviewers want to see a clear separation between the control plane and data plane, with the application server never touching file bytes.
Hints to consider:
- Use pre-signed URLs to upload file chunks directly to object storage (S3), keeping app servers out of the data path
- Support multipart uploads with content-addressable chunks (hash-based deduplication) so identical blocks across files are stored once
- Implement resumable uploads where each chunk is individually acknowledged, enabling recovery from any interruption point
- Trigger post-upload processing (virus scanning, thumbnail generation) via events after the final chunk is confirmed
2. Multi-Device Sync Protocol
Users expect files to appear on all their devices within seconds. Interviewers probe your sync architecture, especially for offline scenarios and conflict resolution.
Hints to consider:
- Maintain a durable change log (event stream) where every file operation is recorded with a monotonically increasing sequence number
- Each device tracks a per-device cursor (last processed sequence number) and requests changes since that cursor on reconnect
- Use a single persistent WebSocket connection per client for push notifications of new changes, with delta-sync on reconnection
- Handle conflicts (two devices modify the same file offline) with a deterministic policy -- last-write-wins with the losing version saved as a conflict copy
3. Metadata Consistency and File Operations
File operations like move, rename, and permission changes must be strongly consistent. Interviewers look for transactional guarantees in your metadata layer.
Hints to consider:
- Store directory trees, file versions, ACLs, and share links in a relational database (PostgreSQL) with ACID transactions
- Model the file system as a hierarchy with parent pointers and use transactions for moves/renames that update both source and destination
- Publish file-change events to Kafka after committing metadata changes (transactional outbox pattern) for downstream sync, indexing, and notification
- Version files with content hashes to enable deduplication and efficient diff detection
4. Content Delivery and Cost Optimization
Serving file downloads at global scale requires CDN-backed delivery and smart storage tiering. Interviewers want to see cost-conscious design.
Hints to consider:
- Serve downloads through CDN with signed URLs that expire, preventing unauthorized access
- Implement storage tiering: frequently accessed files on standard S3, older files on infrequent-access or glacier-class storage
- Use content-addressable storage with deduplication at the chunk level to reduce storage costs across users
- Cache hot file metadata in Redis to reduce database load during burst access patterns
Suggested Approach
Step 1: Clarify Requirements
Start by confirming scope and constraints. Ask about expected file sizes (typical and maximum), whether real-time collaboration on files (like Google Docs) is in scope, the number of devices per user, and offline duration expectations. Clarify sharing requirements: individual files only, or entire folder hierarchies? Determine if version history is required and for how long. Establish sync latency targets and consistency requirements for different operations.