Design DropBox — Miro

Reference Answer

For a walkthrough of designing a distributed file system with sync semantics, see our Design File System guide. It covers metadata management, chunked storage, and multi-device synchronization patterns that form the foundation for this problem.

Also review the Blob Storage, Databases, Message Queues, and CDN building blocks.

Problem Statement

Design a file storage and synchronization service like Dropbox that allows users to upload, download, and sync files across multiple devices with real-time updates. The core experience is seamless: drop a file into a folder on one device and it appears on every other device within seconds, even if some devices were offline when the change happened.

This problem forces you to separate a control plane (metadata, authentication, sync coordination) from a data plane (large file transfers). You need to handle multi-device real-time updates, large blob uploads with resumability, conflict resolution when two devices edit the same file offline, and cost-efficient distribution through CDN and object storage. Interviewers use this question to evaluate your ability to decompose a seemingly simple product into well-bounded services with clear consistency and durability guarantees.

Key Requirements

Functional

File upload and download -- Users upload files of any size with progress indication and the ability to pause, resume, and recover from interruptions
Cross-device sync -- Files automatically synchronize across all linked devices, reflecting creates, updates, renames, and deletes, even after offline periods
File browsing and search -- Users browse folder hierarchies and search across their file catalog with accurate, up-to-date listings
Sharing and access control -- Users share files or folders with collaborators and control view-only or edit access through permissions and public links

Non-Functional

Scalability -- Support hundreds of millions of files and petabytes of total storage with millions of sync operations per day
Reliability -- No data loss during network failures or server crashes; 99.9 percent availability for file access
Latency -- Metadata operations (listing, search) under 200ms; file sync propagation to other devices within seconds of upload completion
Consistency -- Strong consistency for metadata operations (permissions, renames); eventual consistency acceptable for cross-device sync with convergence within seconds

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Separating Control Plane from Data Plane

The most common mistake is routing file data through application servers. Interviewers immediately probe whether you understand that metadata operations and bulk data transfers must travel separate paths.

Hints to consider:

The control plane handles authentication, folder structure, permissions, and sync coordination through lightweight API calls
The data plane uses pre-signed URLs to let clients upload directly to and download directly from object storage, bypassing application servers entirely
This separation means a surge in large file uploads does not affect the responsiveness of folder browsing or permission checks
Think of the metadata service as the "brain" and object storage as the "muscles" -- they scale independently

2. Chunked Uploads and Resumability

Users upload multi-gigabyte files over unreliable networks. Without chunking and resume capability, a single dropped packet near the end of a large upload forces the user to start over.

Hints to consider:

Split files into fixed-size chunks (for example, 4 MB each), upload each chunk independently, and reassemble on the server side
Use content-addressable storage where each chunk is identified by its hash, enabling deduplication across files and users
Track upload progress per chunk so that retries only re-upload the missing pieces
Compute checksums per chunk and for the whole file to verify integrity end-to-end

3. Multi-Device Sync and Change Propagation

When a file changes on one device, all other linked devices must learn about it quickly and pull the updated content. Polling is wasteful and slow; pure push is fragile at scale.

Hints to consider:

Maintain a per-device sync cursor (a monotonically increasing version number or offset into a change log) so each device knows where it left off
Use a push notification channel (WebSocket or long-poll) to alert devices that new changes are available, then let devices pull the actual change list
A durable, append-only change log per user (or per shared folder) acts as the source of truth for sync state
Kafka or a similar event log maps naturally to this pattern, with per-consumer offsets acting as device cursors

4. Conflict Resolution for Offline Edits

Two devices may edit the same file while offline. When both reconnect, the system must decide how to handle the conflict without losing either version.

Hints to consider:

Use version vectors or logical timestamps to detect when two devices have diverged from a common ancestor
The simplest production strategy is "last writer wins with conflict copy" -- accept the most recent edit and save the other as a conflict file the user can manually reconcile
For folder-level conflicts (one device renames a file while another deletes it), define clear precedence rules and log all decisions for auditability
Avoid trying to merge file contents automatically for binary files; content merge only makes sense for structured formats

5. Storage Cost and Deduplication

At petabyte scale, storage cost dominates. Interviewers expect you to think about deduplication, tiered storage, and lifecycle policies.

Hints to consider:

Content-addressable chunking means identical chunks across different files or users are stored only once
Move infrequently accessed files to cheaper storage tiers (for example, S3 Infrequent Access or Glacier) based on last-access timestamps
Implement file versioning with configurable retention so deleted files can be recovered within a window but eventually purged
Compress chunks before storing them, especially for text-heavy file types

Suggested Approach

Step 1: Clarify Requirements

Confirm the expected file size distribution (documents versus media versus archives), whether real-time collaboration on file contents is needed or just file-level sync, how many devices a typical user links, and whether the system is consumer-facing or enterprise. Ask about retention policies, version history depth, and sharing model (internal teams versus public links). Establish offline requirements: do users need to pre-select files for offline access, or should the system sync everything automatically?

Step 2: High-Level Architecture

Sketch the core components: a metadata service backed by a relational database (Postgres) for folder trees, permissions, and file version records; an object storage layer (S3) for file chunks; a sync service that maintains a per-user change log and pushes notifications to connected devices; an upload service that generates pre-signed URLs and tracks chunk progress; a notification service that alerts devices of new changes via WebSocket; and a background processing pipeline for thumbnails, virus scanning, and search indexing. Show the clear separation between metadata API calls and direct-to-S3 data transfers.

Step 3: Deep Dive on Sync and Upload

Walk through the end-to-end upload flow. The client splits the file into chunks, computes a hash per chunk, and asks the metadata service which chunks already exist (deduplication check). For new chunks, the client receives pre-signed upload URLs and uploads directly to S3. Once all chunks land, the client notifies the metadata service, which atomically creates a new file version record and appends an entry to the user's change log. The sync service pushes a notification to all other connected devices. Each device fetches the change log from its last cursor position, discovers the new file version, downloads the new or changed chunks via pre-signed download URLs, and reconstructs the file locally.

Step 4: Address Secondary Concerns

Cover reliability by replicating metadata across database replicas with automatic failover, and by relying on S3's built-in durability for file data. Discuss search indexing by extracting text content from uploaded files and feeding it to an Elasticsearch cluster via a Kafka-powered pipeline. Address sharing by storing ACLs in the metadata database and checking permissions on every API call and pre-signed URL generation. Touch on monitoring: track sync lag per device, upload success rates, deduplication ratios, and storage growth. If time permits, discuss multi-region deployment where metadata is replicated globally and file chunks are cached at edge locations via CDN for faster downloads.

Real Interview Insights

Candidates at Miro report that the interviewer focused heavily on the synchronization mechanism across multiple devices, asking detailed questions about how versioning works and how conflicts are detected and resolved. Another common emphasis is on strong consistency for metadata operations -- interviewers want to see that you understand why ACID properties matter for operations like rename, move, and permission changes even when file sync itself can be eventually consistent.

Reference Answer

Also review the Blob Storage, Databases, Message Queues, and CDN building blocks.

Problem Statement

Key Requirements

Functional

File upload and download -- Users upload files of any size with progress indication and the ability to pause, resume, and recover from interruptions
Cross-device sync -- Files automatically synchronize across all linked devices, reflecting creates, updates, renames, and deletes, even after offline periods
File browsing and search -- Users browse folder hierarchies and search across their file catalog with accurate, up-to-date listings
Sharing and access control -- Users share files or folders with collaborators and control view-only or edit access through permissions and public links

Non-Functional

Scalability -- Support hundreds of millions of files and petabytes of total storage with millions of sync operations per day
Reliability -- No data loss during network failures or server crashes; 99.9 percent availability for file access
Latency -- Metadata operations (listing, search) under 200ms; file sync propagation to other devices within seconds of upload completion
Consistency -- Strong consistency for metadata operations (permissions, renames); eventual consistency acceptable for cross-device sync with convergence within seconds

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Separating Control Plane from Data Plane

Hints to consider:

The control plane handles authentication, folder structure, permissions, and sync coordination through lightweight API calls
The data plane uses pre-signed URLs to let clients upload directly to and download directly from object storage, bypassing application servers entirely
This separation means a surge in large file uploads does not affect the responsiveness of folder browsing or permission checks
Think of the metadata service as the "brain" and object storage as the "muscles" -- they scale independently

2. Chunked Uploads and Resumability

Users upload multi-gigabyte files over unreliable networks. Without chunking and resume capability, a single dropped packet near the end of a large upload forces the user to start over.

Hints to consider:

Split files into fixed-size chunks (for example, 4 MB each), upload each chunk independently, and reassemble on the server side
Use content-addressable storage where each chunk is identified by its hash, enabling deduplication across files and users
Track upload progress per chunk so that retries only re-upload the missing pieces
Compute checksums per chunk and for the whole file to verify integrity end-to-end

3. Multi-Device Sync and Change Propagation

When a file changes on one device, all other linked devices must learn about it quickly and pull the updated content. Polling is wasteful and slow; pure push is fragile at scale.

Hints to consider:

Maintain a per-device sync cursor (a monotonically increasing version number or offset into a change log) so each device knows where it left off
Use a push notification channel (WebSocket or long-poll) to alert devices that new changes are available, then let devices pull the actual change list
A durable, append-only change log per user (or per shared folder) acts as the source of truth for sync state
Kafka or a similar event log maps naturally to this pattern, with per-consumer offsets acting as device cursors

4. Conflict Resolution for Offline Edits

Two devices may edit the same file while offline. When both reconnect, the system must decide how to handle the conflict without losing either version.

Hints to consider:

Use version vectors or logical timestamps to detect when two devices have diverged from a common ancestor
The simplest production strategy is "last writer wins with conflict copy" -- accept the most recent edit and save the other as a conflict file the user can manually reconcile
For folder-level conflicts (one device renames a file while another deletes it), define clear precedence rules and log all decisions for auditability
Avoid trying to merge file contents automatically for binary files; content merge only makes sense for structured formats

5. Storage Cost and Deduplication

At petabyte scale, storage cost dominates. Interviewers expect you to think about deduplication, tiered storage, and lifecycle policies.

Hints to consider:

Content-addressable chunking means identical chunks across different files or users are stored only once
Move infrequently accessed files to cheaper storage tiers (for example, S3 Infrequent Access or Glacier) based on last-access timestamps
Implement file versioning with configurable retention so deleted files can be recovered within a window but eventually purged
Compress chunks before storing them, especially for text-heavy file types