Practice/Meta/Design Pagination for Instagram Newsfeed
Design Pagination for Instagram Newsfeed
Product DesignMust
Problem Statement
Design a web-based collaborative document editor similar to Google Docs or Notion, where multiple users can simultaneously edit the same document and see each other's changes in real-time. Users should be able to create documents, share them with collaborators, and see live cursors showing where others are typing. The system needs to handle conflicts when multiple users edit the same section simultaneously, preserve document history for undo/redo operations, and maintain consistency across all connected clients.
The core challenge lies in operational transformation or conflict-free replicated data types (CRDTs) to ensure consistent document state across distributed clients, handling network partitions gracefully, and providing a responsive editing experience even under high concurrency. You should design for millions of active documents with tens of thousands of concurrent editing sessions at peak times.
Key Requirements
Functional
- Real-time collaboration -- multiple users can edit the same document simultaneously with sub-second synchronization
- Conflict resolution -- concurrent edits to the same document section are merged automatically without losing data
- Presence indicators -- show active collaborators with live cursor positions and selections
- Document versioning -- maintain edit history for undo/redo and point-in-time recovery
- Access control -- document owners can grant read/write permissions to specific users
- Rich text formatting -- support for text styling, headings, lists, images, and embedded content
Non-Functional
- Scalability -- support 10M+ active documents, 100K+ concurrent editing sessions
- Reliability -- 99.95% uptime with automatic failover and data replication
- Latency -- local edits appear instantly, remote edits propagate within 200ms globally
- Consistency -- eventual consistency across all clients with guaranteed convergence to identical state
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Conflict Resolution Strategy
The most critical technical decision is how to handle concurrent edits. Interviewers expect deep discussion of operational transformation (OT) versus conflict-free replicated data types (CRDTs), their tradeoffs, and implementation complexity.
Hints to consider:
- Operational Transformation requires a central server to serialize operations but is more space-efficient
- CRDTs like Yjs or Automerge allow peer-to-peer sync and work offline but have larger memory footprints
- Consider the "character position problem" when two users insert at the same location simultaneously
- Discuss how to handle rapid typing (keystroke batching) versus sending individual character operations
2. Real-Time Communication Architecture
How changes propagate between clients and servers determines system responsiveness and scalability.
Hints to consider:
- WebSocket connections for bidirectional real-time communication versus long-polling fallbacks
- Pub/sub architecture where clients subscribe to document channels for updates
- Message ordering guarantees and handling out-of-order delivery in distributed systems
- Connection management strategies for mobile clients with intermittent connectivity
3. Data Model and Storage
Document representation affects query performance, storage efficiency, and ease of applying operations.
Hints to consider:
- Store documents as operation logs versus materialized snapshots versus hybrid approaches
- Periodic snapshot creation to avoid replaying millions of operations on document load
- Database choice: document stores (MongoDB), distributed databases (Cassandra), or object storage (S3)
- Indexing strategies for efficient document retrieval and permission checks
4. Presence and Cursor Synchronization
Showing where collaborators are editing requires efficient state broadcasting without overwhelming the system.
Hints to consider:
- Separate channels for document content versus ephemeral presence data
- Throttling cursor position updates to avoid flooding the network
- Client-side interpolation to smooth cursor movements between updates
- Handling presence state cleanup when clients disconnect unexpectedly
5. Scalability and Performance Optimization
Supporting millions of documents requires careful service decomposition and caching strategies.
Hints to consider:
- Sharding documents across multiple WebSocket servers based on document ID
- Redis or similar for fast in-memory caching of active document state
- CDN for serving static document content and read-only views
- Rate limiting per document to prevent abuse from malicious clients
Suggested Approach
Step 1: Clarify Requirements
Start by confirming the scope and priorities with your interviewer:
- What is the expected number of concurrent editors per document (2-5, 10-50, or 100+)?
- Do we need offline editing support, or can we assume always-online clients?
- Should the system support rich media (images, videos) or focus on text?
- What is the expected document size limit (pages, megabytes)?
- Do we need granular version history (every keystroke) or periodic checkpoints?
- What consistency guarantees are acceptable (strong vs eventual)?
Step 2: High-Level Architecture
Sketch the major system components:
Client Layer:
- Web/mobile app with text editor component
- Local document state and operation buffer
- WebSocket client for real-time communication
API Gateway & Load Balancer:
- Routes HTTP requests for authentication, document CRUD
- WebSocket connection management and sticky sessions
Collaboration Service:
- WebSocket servers handling real-time edit operations
- Operational transformation or CRDT conflict resolution
- Document session management
Storage Layer:
- Primary database for document metadata and permissions
- Object storage for document content and snapshots
- Cache layer for hot documents
Supporting Services:
- Authentication service
- Notification service
- Analytics and monitoring
Step 3: Deep Dive on Conflict Resolution
Walk through how concurrent edits are handled:
Operational Transformation Approach:
- Client generates operation (insert, delete, format) with position and content
- Client sends operation to server with document version number
- Server receives operations from multiple clients, transforms them to account for concurrent ops
- Server broadcasts transformed operations to all connected clients
- Clients apply remote operations to local document state
Example: User A inserts "hello" at position 0 while User B inserts "world" at position 0. The server transforms B's operation to position 5 so both insertions are preserved as "helloworld".
Key considerations:
- Operation queue and ordering at the server
- Handling operations that arrive out of order
- Transformation functions for different operation types (insert, delete, format)
- Optimistic updates on client side with rollback on conflict
Alternative CRDT approach:
- Each character has a unique immutable identifier (position between two fractional indices)
- Insertions don't shift positions, enabling commutative operations
- No central server needed for transformation
- Discuss tradeoffs: simpler consistency model but larger memory usage
Step 4: Address Secondary Concerns
Scaling WebSocket Connections:
- Use Redis pub/sub for broadcasting across multiple WebSocket servers
- Implement connection pooling and heartbeat mechanisms
- Handle reconnection logic with operation replay from last acknowledged version
Document Persistence:
- Append operations to log-structured storage
- Create snapshots every N operations or every T minutes
- Store snapshots in blob storage with pointers in metadata database
Access Control:
- Check permissions at WebSocket connection establishment
- Re-verify on sensitive operations
- Implement row-level security in database for multi-tenancy
Monitoring and Observability:
- Track operation latency percentiles
- Monitor concurrent editors per document
- Alert on conflict rate anomalies
title: Design TikTok Video Feed
Problem Statement
Design a short-form video sharing platform similar to TikTok where users can upload videos up to 60 seconds, scroll through an endless personalized feed, and interact through likes, comments, and shares. The platform should deliver a highly engaging "For You" feed that adapts to user preferences in real-time, handling hundreds of millions of daily active users watching billions of videos.
The primary challenge involves designing a recommendation engine that can quickly surface relevant content from a massive video catalog while maintaining low latency for feed generation. You must handle video processing pipelines for uploaded content, build efficient storage and retrieval systems for both video files and metadata, and create a feed ranking algorithm that balances content freshness, user engagement signals, and creator diversity. The system needs to scale globally with regional data centers while keeping video startup latency under 500ms.
Key Requirements
Functional
- Video upload and processing -- users can record or upload videos with basic editing, filters, and audio overlays
- Personalized feed generation -- algorithmically ranked infinite scroll feed tailored to individual user preferences
- Social interactions -- like, comment, share, follow creators, and duet/stitch existing videos
- Content discovery -- search, trending hashtags, category browsing, and creator profiles
- Video playback -- adaptive bitrate streaming with preloading for seamless transitions between videos
Non-Functional
- Scalability -- handle 1B+ daily active users, 100M+ video uploads daily, 10B+ video views per day
- Reliability -- 99.9% uptime for feed serving, graceful degradation when recommendation service is unavailable
- Latency -- feed loads in under 1 second, video starts playing within 500ms, next video preloaded during current playback
- Consistency -- eventual consistency for engagement metrics, strong consistency for creator-uploaded content visibility
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Feed Ranking and Recommendation Algorithm
The core differentiator is how to generate a personalized feed that keeps users engaged. Interviewers expect discussion of ranking signals, candidate generation, and real-time personalization.
Hints to consider:
- Two-stage approach: candidate generation retrieves hundreds of videos, then ranking model scores top candidates
- Collaborative filtering finds similar users and surfaces videos they enjoyed
- Content-based features like video category, hashtags, audio track, creator profile
- Engagement prediction models trained on historical interactions (watch time, completion rate, likes)
- Exploration vs exploitation tradeoff to show diverse content while optimizing for engagement
- Cold start problem for new users and new videos without interaction history
2. Video Storage and CDN Strategy
Efficient video delivery at global scale requires careful consideration of storage architecture and content distribution.
Hints to consider:
- Separate storage for raw uploaded videos vs processed/transcoded versions in multiple resolutions
- Object storage (S3, GCS) for durability with CDN edge caching for low latency delivery
- Adaptive bitrate streaming (HLS, DASH) with multiple quality levels based on network conditions
- Geographic distribution with regional CDN POPs to minimize latency
- Hot/warm/cold storage tiers based on video popularity and recency
- Video metadata stored separately in database for fast querying without loading video files