Practice/Snapchat/Design a system which can resume videos
Design a system which can resume videos
System DesignMust
Problem Statement
Design a system that lets users pause a video on one device and seamlessly pick up playback from the exact same position on any other device. This is the "Continue Watching" experience familiar from Netflix, Disney+, and similar streaming platforms -- you stop a movie halfway through on your living room TV, then open the app on your phone during a commute and it resumes right where you left off.
The scope of this problem is deliberately narrow: you are not designing the video encoding pipeline or CDN. Instead, the focus is on the state synchronization layer that tracks playback position across devices. The system must handle a write-heavy workload where hundreds of millions of active sessions continuously report their playback position, propagate updates to other devices within a second or two, and resolve conflicts when multiple devices send competing position updates. Clients may go offline and reconnect later, so the system must reconcile stale updates without moving the playback cursor backward.
At Netflix-scale, this means ingesting roughly 500 million position updates per hour from 100 million daily active users, each potentially using two to four devices. The position data itself is small (a user-profile-video key with a timestamp value), but the sheer volume and the real-time propagation requirement make this a challenging exercise in distributed state management, write optimization, and conflict resolution.
Key Requirements
Functional
- Cross-device resume -- when a user opens a video on any device, playback starts from the most recent position saved by any of their devices
- Automatic position tracking -- the client periodically saves the current playback position without requiring explicit user action
- Per-profile isolation -- each profile within a shared account maintains its own independent watch progress
- Offline resilience -- devices that lose connectivity queue position updates locally and reconcile when the network recovers, never overwriting a more recent position with a stale one
Non-Functional
- Scalability -- handle 500 million position updates per hour across 100 million daily active users with multiple devices per user
- Latency -- propagate a position change to other connected devices within 1-2 seconds under normal conditions
- Reliability -- 99.9 percent of position updates are successfully persisted without data loss
- Consistency -- guarantee monotonic progress so the cursor never jumps backward, even under concurrent multi-device updates
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Conflict Resolution and Monotonic Progress
When a user has two devices active simultaneously -- or when an offline device reconnects with a stale position -- the system must decide which update wins. A naive last-write-wins approach based on wall-clock time can cause the cursor to jump backward if an older device's update arrives after a newer one.
Hints to consider:
- Enforce a monotonic position rule: the server only accepts an update if the new position is greater than or equal to the currently stored position for that user-profile-video key
- Attach a version number or logical clock to each update so the server can reject outdated writes regardless of arrival order
- Use session heartbeats to identify the actively playing device and weight its updates more heavily than background devices
- Handle the edge case where a user starts the same video on two devices at different positions -- define a clear tiebreaking rule such as "highest position wins"
2. Write Volume Management and Cost Control
If the client reports its position every second during a two-hour movie, that generates 7,200 writes per session. Multiplied across millions of concurrent viewers, this creates enormous write traffic, hot partitions, and significant infrastructure cost.
Hints to consider:
- Implement client-side coalescing: buffer position updates and send them every 5-10 seconds during normal playback, increasing frequency near the end of a video or during user seek actions
- Use a write-through cache in Redis to absorb the burst of updates, then flush to DynamoDB asynchronously in batches
- Partition the data store by a composite key of user ID and profile ID to distribute writes evenly and avoid celebrity-user hotspots
- Apply adaptive update intervals based on playback state: more frequent during seeking, less frequent during steady playback
3. Real-Time Synchronization Across Devices
Other devices need to learn about position changes quickly so that when a user switches devices, the resume is instant. But polling from millions of devices every second is prohibitively expensive.
Hints to consider:
- Establish WebSocket connections from devices currently streaming video, allowing the server to push position updates in real time
- Use Redis pub/sub to fan out position change notifications to connected devices subscribing to that user-profile channel
- For devices in the background or idle, fall back to a simple API call on app launch to fetch the latest position -- no persistent connection needed
- Implement presence tracking to know which devices are actively connected and only push to those, reducing unnecessary fan-out
4. Storage Architecture and Data Model
The storage layer must support high write throughput for position updates, fast reads by user-profile-video key, and efficient cleanup of stale entries for videos users will never finish.
Hints to consider:
- Use DynamoDB with a partition key of
userId#profileId and a sort key of videoId for efficient per-user queries and single-video lookups
- Store each record with the current position, last update timestamp, active session ID, and a version number for conflict detection
- Deploy a Redis cluster as a hot cache layer in front of DynamoDB, keyed by
userId:profileId:videoId, to serve the read path with sub-millisecond latency
- Set TTLs on both Redis and DynamoDB records (for example, 90 days since last update) to automatically clean up abandoned watch progress