Design DropBox — Goldman Sachs

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Design a Distributed File System guide. The file system guide covers chunked storage, metadata management, and synchronization patterns that form the foundation of a Dropbox-like service.

Also review the Blob Storage, Message Queues, and Databases building blocks for background on large object storage, change event propagation, and metadata consistency.

Problem Statement

Design a file storage and synchronization system like Dropbox that allows users to upload, download, and sync files across multiple devices with real-time updates. The core experience is effortless: drop a file in one place and it appears everywhere, even if you go offline and come back later.

The key architectural challenge is separating the control plane (metadata, authentication, sync coordination) from the data plane (large file transfers). You must handle multi-device real-time synchronization, large binary uploads with resume capability, conflict resolution when the same file is edited on two devices while offline, and cost-efficient content distribution. Interviewers want to see clear requirements, a scalable architecture with distinct metadata and blob paths, and practical tradeoffs for reliability, consistency, and user experience.

Key Requirements

Functional

File upload and download -- users can upload files of any size with progress indication and the ability to pause, resume, and recover from interruptions
Cross-device synchronization -- files stay automatically synchronized across multiple devices, reflecting creates, updates, renames, and deletes, even after being offline
Browsing and search -- users can browse their file hierarchy and search by name across all synced files with accurate, up-to-date listings
Sharing and access control -- users can share files or folders with others, controlling view or edit access for collaborators and via public links

Non-Functional

Scalability -- support hundreds of millions of users with petabytes of total storage, handling millions of concurrent sync operations
Reliability -- 99.9% uptime with zero data loss; files must be durable even during infrastructure failures
Latency -- small file uploads complete within seconds; sync notifications propagate to other devices within 5 seconds
Consistency -- eventual consistency for sync state across devices, but strong consistency for metadata operations like rename and permission changes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Chunked Upload and Resumability

Files are frequently multi-gigabyte and uploaded over unreliable networks. Interviewers want to see how you avoid turning application servers into a bottleneck and ensure uploads survive interruptions.

Hints to consider:

Split files into fixed-size chunks (e.g., 4-8 MB), upload each chunk directly to object storage using pre-signed URLs, and track upload progress in metadata
Compute a checksum per chunk on the client side and verify it server-side to detect corruption during transfer
Use content-addressable storage (hash of chunk content as key) to enable deduplication across users and file versions
Allow the client to resume by querying which chunks have already been uploaded, skipping completed ones

2. Multi-Device Sync Protocol

Keeping files in sync across multiple devices is the defining feature. Interviewers expect a concrete sync mechanism with cursors, change logs, and conflict handling.

Hints to consider:

Maintain a per-user change log (event journal) where every file operation (create, update, delete, rename, move) is recorded with a monotonically increasing sequence number
Each device stores its last synced cursor; on reconnection, it fetches all changes after that cursor and applies them locally
Use a long-polling or WebSocket connection to push real-time change notifications to online devices, falling back to periodic polling if the connection drops
Handle conflicts when two devices modify the same file offline by keeping both versions and letting the user choose, or by using a last-writer-wins policy with conflict copies

3. Metadata vs. Data Plane Separation

Interviewers look for a clean separation between the control plane handling file metadata and the data plane handling actual file bytes. Mixing them creates scaling bottlenecks.

Hints to consider:

Store file metadata (name, path, size, permissions, version, chunk list) in a relational database like PostgreSQL for transactional consistency
Store file content chunks in object storage (S3) with a CDN for download acceleration
Use pre-signed URLs so clients upload and download directly to/from object storage without proxying through application servers
Emit file-change events to Kafka for downstream consumers: sync workers, search indexers, thumbnail generators, and antivirus scanners

4. Conflict Resolution and Versioning

When two users edit the same shared file, or a single user edits on two offline devices, the system must handle conflicts gracefully. Interviewers probe whether you have a concrete strategy.

Hints to consider:

Track a version number or vector clock per file; on upload, compare the client's base version against the current server version
If versions match, accept the upload; if they diverge, create a conflict copy with a clear naming convention and notify the user
Maintain a version history as a chain of chunk lists so users can browse and restore previous states without storing full duplicates
For shared folders, apply operational transforms on directory-level operations (rename, move, delete) to avoid lost updates

Suggested Approach

Step 1: Clarify Requirements

Start by confirming scope and constraints. Ask about the expected file size distribution (many small files or some very large ones), the number of devices per user, and whether real-time collaboration on the same file is required or just sync-on-save. Clarify whether sharing is public links only or includes fine-grained ACLs. Verify the acceptable sync delay and whether offline editing with conflict resolution is a hard requirement. Establish durability and availability SLAs.

Step 2: High-Level Architecture

Sketch the major components: a Metadata Service (PostgreSQL) that manages the file tree, permissions, versions, and chunk manifests; an Upload Service that generates pre-signed URLs for direct-to-S3 chunk uploads and orchestrates multi-step upload workflows; a Sync Service that maintains per-device cursors and pushes change notifications via WebSocket or long-polling; an Event Bus (Kafka) that emits file-change events for background processors; a Search Service for file name indexing; and a CDN for fast downloads. Show two distinct data flows: the upload path (client to S3 via pre-signed URL, then metadata commit) and the sync path (change event to Kafka, fan-out to connected devices).

Step 3: Deep Dive on Upload and Sync

Walk through the upload flow for a large file. The client splits the file into chunks, computes a SHA-256 hash per chunk, and requests upload URLs from the Upload Service. For each chunk, the client uploads directly to S3 using the pre-signed URL. Once all chunks are uploaded, the client sends a commit request to the Metadata Service, which atomically creates the file record with the chunk manifest and increments the user's change log sequence number. The Sync Service detects the new change log entry and pushes a notification to the user's other connected devices, which fetch the updated metadata and download chunks they do not already have locally. Discuss how deduplication works: if a chunk hash already exists in storage, skip the upload and just reference the existing chunk.

Step 4: Address Secondary Concerns

Cover conflict resolution: when two devices upload different versions of the same file, the second upload detects a version mismatch and creates a conflict copy. Discuss sharing: store ACLs in the metadata database and check permissions on every API call; for shared folders, propagate changes to all participants' change logs. Address search: index file names and paths in Elasticsearch with user permission filtering. Mention background processing: use Kafka consumers for thumbnail generation, antivirus scanning, and storage quota enforcement. Cover monitoring: track upload success rates, sync latency, chunk deduplication ratio, and storage growth. Discuss disaster recovery: replicate metadata across availability zones and rely on S3's built-in durability for file content.

Real Interview Quotes

"Design a file sharing system. Interviewer indexed on strong consistency and ACID properties."

"OneDrive application where we can upload, download and sync when online and how it behaves when offline. There was a lot of focus on designing the client side."

Related Learning

Distributed File System -- chunked storage, metadata management, and sync protocol design
Blob Storage -- object storage patterns for large file uploads with pre-signed URLs
Message Queues -- Kafka for file-change event propagation to sync and background workers
Databases -- PostgreSQL for transactional file metadata and version tracking
CDN -- edge caching for fast file downloads across regions

Reference Answer

Also review the Blob Storage, Message Queues, and Databases building blocks for background on large object storage, change event propagation, and metadata consistency.

Problem Statement

Key Requirements

Functional

File upload and download -- users can upload files of any size with progress indication and the ability to pause, resume, and recover from interruptions
Cross-device synchronization -- files stay automatically synchronized across multiple devices, reflecting creates, updates, renames, and deletes, even after being offline
Browsing and search -- users can browse their file hierarchy and search by name across all synced files with accurate, up-to-date listings
Sharing and access control -- users can share files or folders with others, controlling view or edit access for collaborators and via public links

Non-Functional

Scalability -- support hundreds of millions of users with petabytes of total storage, handling millions of concurrent sync operations
Reliability -- 99.9% uptime with zero data loss; files must be durable even during infrastructure failures
Latency -- small file uploads complete within seconds; sync notifications propagate to other devices within 5 seconds
Consistency -- eventual consistency for sync state across devices, but strong consistency for metadata operations like rename and permission changes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Chunked Upload and Resumability

Hints to consider:

Split files into fixed-size chunks (e.g., 4-8 MB), upload each chunk directly to object storage using pre-signed URLs, and track upload progress in metadata
Compute a checksum per chunk on the client side and verify it server-side to detect corruption during transfer
Use content-addressable storage (hash of chunk content as key) to enable deduplication across users and file versions
Allow the client to resume by querying which chunks have already been uploaded, skipping completed ones

2. Multi-Device Sync Protocol

Keeping files in sync across multiple devices is the defining feature. Interviewers expect a concrete sync mechanism with cursors, change logs, and conflict handling.

Hints to consider:

Maintain a per-user change log (event journal) where every file operation (create, update, delete, rename, move) is recorded with a monotonically increasing sequence number
Each device stores its last synced cursor; on reconnection, it fetches all changes after that cursor and applies them locally
Use a long-polling or WebSocket connection to push real-time change notifications to online devices, falling back to periodic polling if the connection drops
Handle conflicts when two devices modify the same file offline by keeping both versions and letting the user choose, or by using a last-writer-wins policy with conflict copies

3. Metadata vs. Data Plane Separation

Interviewers look for a clean separation between the control plane handling file metadata and the data plane handling actual file bytes. Mixing them creates scaling bottlenecks.

Hints to consider:

Store file metadata (name, path, size, permissions, version, chunk list) in a relational database like PostgreSQL for transactional consistency
Store file content chunks in object storage (S3) with a CDN for download acceleration
Use pre-signed URLs so clients upload and download directly to/from object storage without proxying through application servers
Emit file-change events to Kafka for downstream consumers: sync workers, search indexers, thumbnail generators, and antivirus scanners

4. Conflict Resolution and Versioning

When two users edit the same shared file, or a single user edits on two offline devices, the system must handle conflicts gracefully. Interviewers probe whether you have a concrete strategy.

Hints to consider:

Track a version number or vector clock per file; on upload, compare the client's base version against the current server version
If versions match, accept the upload; if they diverge, create a conflict copy with a clear naming convention and notify the user
Maintain a version history as a chain of chunk lists so users can browse and restore previous states without storing full duplicates
For shared folders, apply operational transforms on directory-level operations (rename, move, delete) to avoid lost updates

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Upload and Sync

Step 4: Address Secondary Concerns

Real Interview Quotes

"Design a file sharing system. Interviewer indexed on strong consistency and ACID properties."

"OneDrive application where we can upload, download and sync when online and how it behaves when offline. There was a lot of focus on designing the client side."

Related Learning

Distributed File System -- chunked storage, metadata management, and sync protocol design
Blob Storage -- object storage patterns for large file uploads with pre-signed URLs
Message Queues -- Kafka for file-change event propagation to sync and background workers
Databases -- PostgreSQL for transactional file metadata and version tracking
CDN -- edge caching for fast file downloads across regions