Design a real-time messaging platform like WhatsApp or Meta Messenger that supports one-to-one text conversations, message delivery status tracking (sent, delivered, read), and user presence indicators (online/last seen). Users expect their conversations to sync seamlessly across multiple devices -- phone, tablet, and desktop -- and to be able to compose messages while offline with automatic delivery when connectivity returns.
The system must handle billions of users exchanging tens of billions of messages daily with strict latency targets (under 200ms end-to-end when both parties are online). The core engineering challenges are maintaining persistent bidirectional connections at massive scale, guaranteeing at-least-once delivery with deduplication, preserving message ordering within each conversation, synchronizing state across multiple devices per user, and gracefully handling the reality of unreliable mobile networks where users constantly disconnect and reconnect.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to see how messages flow from sender to recipient with guaranteed ordering and without duplicates. This tests your understanding of distributed systems fundamentals, idempotency, and partitioned processing.
Hints to consider:
Maintaining billions of persistent bidirectional connections requires careful architecture. Interviewers probe how you distribute connection state, handle reconnections efficiently, and route messages to the correct gateway server.
Hints to consider:
When a user has a phone, tablet, and desktop all active, every device must show the same conversation state. Interviewers look for explicit sync strategies and cursor management.
Hints to consider:
Storing and querying billions of conversations efficiently requires thoughtful data modeling. Interviewers evaluate your storage choices, partitioning strategy, and ability to serve both recent chat windows and deep history.
Hints to consider:
Presence updates generate enormous write volume (every online/offline transition for billions of users) but have relaxed consistency requirements. Interviewers want to see how you optimize this high-frequency signal without overloading the system.
Hints to consider:
Begin by confirming the scope. Ask whether the system needs group chats or only one-to-one conversations. Verify scale expectations: daily active users, message volume, and peak-to-average ratio. Clarify whether multimedia messages (images, videos) are in scope or text-only. Confirm whether end-to-end encryption is required, as this significantly impacts delivery tracking and server-side processing. Establish latency targets for different operations and whether global multi-region deployment is needed.
Sketch the core components: client applications (mobile, web), an API gateway, a fleet of WebSocket gateway servers for persistent connections, a message ingestion service, Kafka for durable ordered message logs partitioned by conversation ID, message delivery workers that consume from Kafka and push to recipients, a conversation storage layer (Cassandra for its write-optimized wide-row model), a Redis cache for recent messages and connection routing, and a separate presence service. Draw the message flow: sender posts to gateway, gateway writes to Kafka, delivery worker consumes and pushes to recipient's gateway, acknowledgments flow back through the system.
Walk through the end-to-end flow. The sender generates a UUID idempotency key and sends the message through the WebSocket connection. The gateway server validates the request and publishes to the Kafka partition for this conversation (using conversation ID as the partition key to preserve ordering). A delivery worker consumes from the partition, assigns a server-side sequence number, writes the message to Cassandra, and looks up the recipient's connected devices in the Redis routing table. For each connected device, the worker pushes the message through the appropriate gateway server. The recipient's client acknowledges receipt; the worker updates the delivery status. If the recipient is offline, the message remains in storage; when the device reconnects and provides its cursor, the gateway fetches and delivers all missed messages.
Discuss multi-device sync: each device tracks its own cursor, requests gaps on reconnection, and the server streams missing messages before switching to live push. Cover read receipt handling: when any device opens the conversation, a read event is published; the authoritative read watermark is the maximum across all devices. Address presence: a lightweight service maintains an in-memory map updated by gateway heartbeats, with batched publication to subscribed contacts. Discuss offline handling: the client queues messages locally and retries with exponential backoff on reconnection. Cover monitoring: track message delivery latency percentiles, connection churn rate, Kafka consumer lag, and storage hot partitions. Address horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, and use Cassandra's native sharding.
Deepen your understanding of the patterns used in this problem: