Practice/Meta/Design WhatsApp
Design WhatsApp
Product DesignMust
Problem Statement
Design a real-time workplace communication platform similar to Slack that allows teams to collaborate through channels, direct messages, and threaded conversations. The system should support organizations of varying sizes, from small startups with 10 users to large enterprises with 100,000+ employees. Users should be able to send text messages, share files, search message history, receive notifications, and integrate with third-party services. The platform must handle high message volumes during peak business hours while maintaining message ordering and ensuring reliable delivery across web, mobile, and desktop clients.
Key challenges include managing real-time message delivery at scale, organizing conversations across thousands of channels, implementing efficient search across message history, and maintaining state synchronization across multiple devices per user.
Key Requirements
Functional
- Message delivery -- Support real-time text messaging in channels (public/private) and direct messages (1-on-1 and group)
- Threading -- Allow users to reply to specific messages, creating threaded conversations within channels
- Presence and status -- Show online/offline/away status for users and typing indicators during active conversations
- File sharing -- Enable users to upload and share documents, images, and other files with preview capabilities
- Search functionality -- Provide full-text search across all messages, files, and users that a person has access to
- Notifications -- Deliver push notifications for mentions, direct messages, and configurable channel activity
- Workspaces -- Support multiple isolated workspaces (organizations) with separate channels and members
Non-Functional
- Scalability -- Handle 100,000 concurrent users per workspace and 10,000 messages per second across the platform
- Reliability -- Ensure 99.9% uptime with no message loss and guaranteed delivery
- Latency -- Deliver messages to recipients within 100ms under normal conditions
- Consistency -- Maintain strong consistency for message ordering within channels while accepting eventual consistency for presence information
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Real-Time Message Delivery Architecture
How you design bidirectional communication between clients and servers to enable instant message delivery without constant polling. This reveals your understanding of WebSocket protocols, connection management, and fallback mechanisms.
Hints to consider:
- Consider using WebSocket connections for persistent, bidirectional communication with heartbeat mechanisms
- Design a gateway layer that maintains user connections and routes messages efficiently
- Plan for connection recovery and message queuing when clients reconnect after network failures
- Discuss tradeoffs between long polling, Server-Sent Events, and WebSockets for different client types
2. Message Storage and Retrieval Patterns
How you structure data storage to support both real-time access and historical search while managing data growth. This tests your knowledge of database selection, partitioning strategies, and caching layers.
Hints to consider:
- Partition messages by channel ID or time ranges to enable horizontal scaling
- Consider a write-optimized database like Cassandra or ScyllaDB for high message ingestion rates
- Implement a multi-tier storage strategy where recent messages stay in hot storage and older messages archive to cold storage
- Design indexing strategies that support both message ordering queries and full-text search requirements
3. Channel Fan-Out and Notification Distribution
How you efficiently deliver a single message to thousands of channel members without creating bottlenecks. This examines your ability to design scalable fan-out systems and prioritize delivery based on user engagement.
Hints to consider:
- Use a message queue system like Kafka to decouple message ingestion from delivery
- Implement a fan-out-on-write pattern for small channels and fan-out-on-read for large channels (hybrid approach)
- Design separate delivery pipelines for online users (immediate push) versus offline users (batch processing)
- Consider rate limiting and priority queues to handle notification storms in large channels
4. Search Infrastructure at Scale
How you build a search system that provides fast, relevant results across millions of messages while respecting channel permissions. This evaluates your understanding of search engines, ranking algorithms, and security models.
Hints to consider:
- Use Elasticsearch or similar distributed search engines with sharding based on workspace or time
- Implement incremental indexing to keep search indexes synchronized with message writes
- Design a permissions layer that filters search results based on user channel memberships
- Consider search result ranking that weighs recency, relevance, and user interaction patterns