Practice/Microsoft/Design a Social Media Feed Generation System
Design a Social Media Feed Generation System
System DesignMust
Problem Statement
Design a system that enables multiple users to simultaneously edit the same document in real time, similar to Google Docs, Notion, or Figma. Users should see each other's changes appear instantly as they type, with proper conflict resolution when multiple people edit the same section. The system must handle documents ranging from small text files to large documents with rich formatting, embedded media, and potentially thousands of concurrent collaborators on popular documents.
The challenge lies in balancing strong consistency guarantees with low-latency updates across geographically distributed users. You need to handle network partitions gracefully, resolve conflicting edits deterministically, maintain complete edit history for undo/redo operations, and scale to millions of documents while keeping active sessions responsive. This problem tests your understanding of operational transformation, CRDTs, WebSocket management, distributed state synchronization, and storage strategies for evolving document schemas.
Key Requirements
Functional
- Real-time collaboration -- Multiple users can edit the same document simultaneously and see changes within milliseconds
- Conflict resolution -- When users edit the same content concurrently, the system must merge changes deterministically without data loss
- Presence awareness -- Users can see who else is viewing or editing the document and where their cursors are positioned
- Version history -- Complete edit history allowing users to view past versions and restore previous states
- Permissions and sharing -- Document owners can control access levels (view, comment, edit) and share via links or invitations
- Offline support -- Users can continue editing when disconnected and sync changes when reconnected
Non-Functional
- Scalability -- Support 10 million daily active users, 100 million documents, with up to 1,000 concurrent editors per popular document
- Reliability -- 99.9% uptime with automatic failover and no data loss during network partitions or server failures
- Latency -- Sub-200ms propagation of edits to all active collaborators, p99 latency under 500ms globally
- Consistency -- Eventually consistent with causal ordering guarantees, ensuring all users converge to identical document state
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Operational Transformation vs CRDT Choice
The core technical decision is how to handle concurrent edits and ensure eventual consistency. Interviewers want to see you understand the fundamental tradeoffs between Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDTs), and when each approach is appropriate.
Hints to consider:
- OT requires a central server to serialize operations and transform them, providing smaller payloads but requiring reliable message ordering
- CRDTs like Yjs or Automerge allow peer-to-peer synchronization and work better offline, but generate larger metadata overhead
- Discuss tombstones, logical clocks (Lamport or vector clocks), and how to represent positional data structures
- Consider how you'll handle undo/redo operations which become complex when interleaved with other users' edits
2. WebSocket Architecture and Connection Management
Maintaining thousands of concurrent WebSocket connections per document server requires careful resource management and routing strategy. Interviewers look for awareness of connection pooling, heartbeat mechanisms, and graceful degradation.
Hints to consider:
- Use a connection manager service that maps document IDs to active WebSocket servers and routes messages efficiently
- Implement exponential backoff reconnection logic and queue operations during brief disconnections
- Consider pub/sub patterns (Redis Pub/Sub or Kafka topics) for broadcasting changes to all connected servers
- Plan for WebSocket server restarts, sticky sessions, and state transfer between servers during rebalancing
3. Storage Strategy for Documents and History
Documents have different access patterns than traditional database records: frequent small updates, occasional full reads, and need for efficient historical queries. Your storage layer must optimize for both real-time updates and complete version history.
Hints to consider:
- Separate hot data (current document state, recent operations) from cold data (historical snapshots)
- Store incremental operations in append-only logs for replay, with periodic snapshots to bound reconstruction time
- Use columnar or object storage for versioned snapshots, blob storage for embedded media
- Consider sharding by document ID and partitioning history by time ranges for efficient access
4. Handling Hot Documents and Rate Limiting
Popular documents (company-wide announcements, viral templates) can attract thousands of simultaneous editors, creating extreme load on specific servers. Interviewers want to see scalability thinking beyond the average case.
Hints to consider:
- Implement read replicas or cache layers for view-only participants to reduce load on the primary collaboration server
- Use batching to group rapid successive edits from the same user before broadcasting
- Apply rate limiting per user and per document to prevent abuse and DoS scenarios
- Consider splitting very large documents into smaller chunks that can be edited independently
5. Security, Privacy and Permission Propagation
Access control must be enforced consistently across all system layers, and permission changes need to propagate to active sessions immediately. Interviewers probe how you prevent unauthorized access during collaboration sessions.
Hints to consider:
- Validate permissions at the WebSocket gateway layer and re-check periodically during long sessions
- Propagate permission revocations as high-priority messages that immediately disconnect affected users
- Encrypt documents at rest and consider end-to-end encryption for sensitive use cases
- Implement audit logs tracking who accessed or modified each document and when