Practice/Netflix/Design a system which can resume videos
Design a system which can resume videos
System DesignMust
Problem Statement
Build a distributed system that tracks video playback position across multiple devices for millions of users. When a user pauses a movie on their smart TV, they should be able to pick up their phone moments later and resume at the exact same timestamp. The system must handle users who rapidly switch between devices, resolve conflicts when multiple devices update the same video's position simultaneously, and maintain accurate state even during network interruptions.
Your design should support 100 million daily active users who collectively generate 500 million position updates per hour. The system must provide sub-second propagation of position updates to all of a user's active devices while keeping storage and network costs manageable. Focus on the state synchronization mechanism rather than video delivery infrastructure -- assume content is already streamed through a separate CDN.
Key Requirements
Functional
- Cross-device resume -- users can pause on one device and immediately continue from that exact position on any other device
- Per-profile isolation -- each account profile maintains independent watch progress that never leaks across profiles
- Automatic position save -- the system continuously captures playback position without requiring explicit user actions
- Offline resilience -- devices that lose connectivity should queue updates locally and reconcile when reconnected
Non-Functional
- Scalability -- handle 500M position updates per hour across 100M daily active users with multiple devices each
- Reliability -- ensure 99.9% of position updates are successfully persisted and propagated without data loss
- Latency -- propagate position changes to other devices within 1-2 seconds under normal conditions
- Consistency -- guarantee monotonic progress (never move backward) even with concurrent updates from multiple devices
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Conflict Resolution Strategy
When a user has video playing simultaneously on multiple devices, or when offline devices reconnect, the system must decide which position wins. Naive approaches like "last write wins" can cause the position to jump backward if an older device sends a stale update.
Hints to consider:
- Implement monotonic position guarantees where updates only advance the cursor forward
- Attach version numbers or logical clocks to each update so the server can reject outdated writes
- Use session identifiers with heartbeat mechanisms to detect which device is actively playing
- Consider tie-breaking rules when two devices report similar timestamps within a small window
2. Write Amplification and Cost Control
Users watching a two-hour movie could theoretically generate 7,200 position updates if the client sends one per second. At scale, this creates massive write traffic, hot partitions in the database, and high cloud costs.
Hints to consider:
- Implement client-side coalescing with adaptive batching (e.g., send every 5-10 seconds during active playback)
- Increase update frequency near the end of videos or during user interactions like seeking
- Use write-through caching with Redis to absorb bursts before flushing to durable storage
- Partition by userId + profileId to distribute writes evenly and avoid celebrity user hotspots
3. Real-Time Synchronization Mechanism
Other devices need to learn about position changes quickly, but polling every second from millions of devices is prohibitively expensive. The design must balance latency with infrastructure cost.
Hints to consider:
- Establish WebSocket or long-polling connections for devices currently streaming video
- Use Redis pub/sub or similar lightweight message bus to fan out notifications to connected devices
- Implement exponential backoff polling for devices in background or idle states
- Consider presence tracking to know which devices need immediate updates versus eventual consistency
4. Data Model and Storage Strategy
The storage layer must support high write throughput, allow efficient queries by user and profile, and provide the consistency guarantees needed for conflict resolution.
Hints to consider:
- Use a key-value store like DynamoDB partitioned by {userId, profileId, videoId} composite key
- Store both current position and metadata like update timestamp, device session ID, and version number
- Implement TTL-based cleanup for old entries since most users never return to finish abandoned videos
- Separate hot cache layer (Redis) from cold persistent layer (DynamoDB/Cassandra) for cost optimization