For a walkthrough of designing a distributed file system with sync semantics, see our Design File System guide. It covers metadata management, chunked storage, and multi-device synchronization patterns that form the foundation for this problem.
Also review the Blob Storage, Databases, Message Queues, and CDN building blocks.
Design a file storage and synchronization service like Dropbox that lets users upload, download, and sync files across multiple devices with real-time updates. The core experience is seamless: save a file on one device and it appears on every other linked device within seconds, even if some were offline when the change occurred.
This problem tests your ability to separate a control plane (metadata, authentication, sync coordination) from a data plane (large file transfers). You must handle resumable uploads over unreliable networks, real-time change propagation across devices, conflict resolution for concurrent offline edits, and cost-efficient storage at petabyte scale. Interviewers use this to evaluate whether you can decompose a deceptively simple product into well-bounded services with clear consistency, durability, and scalability guarantees.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Routing multi-gigabyte file uploads through application servers is the most common and costly mistake. Interviewers immediately check whether you understand the need to keep metadata operations and bulk data transfers on separate paths.
Hints to consider:
Users regularly upload files that are hundreds of megabytes or larger over unreliable connections. Without chunking, a network blip near the end of a large upload forces the user to restart from scratch.
Hints to consider:
When a file changes on one device, all other linked devices must learn about it quickly. Pure polling wastes bandwidth and adds latency; pure push is fragile at scale.
Hints to consider:
Two devices may edit the same file while disconnected. When both reconnect, the system needs a deterministic strategy to reconcile divergent histories without losing work.
Hints to consider:
At petabyte scale, storage cost dominates the operating budget. Interviewers expect you to think beyond naive "store everything" approaches.
Hints to consider:
Confirm the expected file size distribution (documents, photos, videos, archives), whether real-time collaboration within files is needed or just file-level sync, how many devices a typical user links, and whether the system serves consumers or enterprises. Ask about version history retention, maximum file size, and sharing model (internal teams versus public links). Clarify offline behavior: does the client sync all files or only user-selected ones?
Sketch the core components: a metadata service backed by Postgres for folder trees, permissions, and file version records; an object storage layer (S3) for file chunks; a sync service that maintains the per-user change log and pushes notifications to connected devices via WebSocket; an upload service that generates pre-signed URLs, tracks chunk progress, and triggers post-upload processing; a background pipeline for thumbnail generation, virus scanning, and search indexing; and a CDN layer for accelerating downloads of popular shared files. Show the separation between metadata API calls and direct-to-S3 data transfers.
Walk through the full upload path. The client splits the file into chunks, computes a SHA-256 hash per chunk, and sends the list to the metadata service. The service checks which chunks already exist (deduplication), returns pre-signed upload URLs for new chunks, and the client uploads them directly to S3. Once all chunks land, the client notifies the metadata service, which atomically creates a new file version record and appends an entry to the change log. The sync service pushes a lightweight notification to all connected devices for that user. Each device fetches the change log from its last cursor, discovers the new version, downloads new or changed chunks via pre-signed URLs, and reconstructs the file locally.
Cover reliability by relying on S3's built-in durability for file data and replicating Postgres with automatic failover for metadata. Discuss search by extracting text from uploaded files and indexing in Elasticsearch via a Kafka pipeline. Address sharing through ACLs stored in the metadata database, checked on every API call and pre-signed URL generation. Mention monitoring: track sync lag per device, upload success rates, deduplication ratios, and storage growth trends. If time allows, discuss multi-region deployment with metadata replicated globally and file chunks cached at edge locations via CDN.
Candidates at Pinterest report that interviewers focused heavily on strong consistency and ACID properties for metadata operations, particularly for file sharing and permission management. The synchronization mechanism across multiple devices was a deep-dive topic, with the interviewer pushing hard on the versioning solution and how conflicts are detected. One interviewer specifically asked about the OneDrive-style experience of how the system behaves when the user goes offline and comes back online. Be ready to explain exactly how per-device cursors track sync state and how the change log enables reliable catch-up after disconnection.