Design a real-time messaging platform similar to WhatsApp or Facebook Messenger that enables users to exchange one-to-one text messages with near-instant delivery. The system must support billions of registered users, track message delivery states (sent, delivered, read), and synchronize conversations across a user's phone, tablet, and desktop. Users expect that messages composed while offline are queued locally and delivered automatically once connectivity returns.
Your design should sustain tens of billions of messages per day, deliver messages end-to-end in under 200 milliseconds for online recipients, and tolerate datacenter outages without losing a single message. You will need to reason carefully about persistent connection management at massive scale, message ordering within conversations, idempotent delivery semantics, and efficient presence tracking for online/offline status.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to understand how you ensure every message arrives exactly once from the user's perspective, in the correct order, even when the network retries, multiple devices are active, and infrastructure is distributed. This tests your grasp of distributed systems fundamentals and idempotency design.
Hints to consider:
Maintaining persistent bidirectional connections for billions of concurrent users while routing messages efficiently is a defining challenge. Interviewers test whether you can distribute connection state, handle reconnections gracefully, and avoid single points of failure.
Hints to consider:
When a user owns a phone, a tablet, and a desktop, every device must converge to the same conversation state. Interviewers look for explicit strategies to synchronize message history, read receipts, and typing indicators without creating race conditions.
Hints to consider:
Storing and querying message history at enormous scale demands careful data modeling. Interviewers evaluate whether you choose appropriate storage engines, partition data logically, and handle hot conversations without degrading performance for other users.
Hints to consider:
Presence generates high write volume because every user heartbeat triggers an update, yet the data has relaxed consistency requirements. Interviewers want to see how you optimize this frequent, low-priority signal without overloading the rest of the system.
Hints to consider:
Begin by confirming scope with the interviewer. Ask whether group chats are in scope or only one-to-one conversations. Verify the scale: how many daily active users, expected messages per user per day, and the read-to-write ratio. Clarify whether multimedia (images, video) is required or if text-only is sufficient. Confirm latency targets for sending, receiving, and syncing. Ask whether end-to-end encryption is in scope, as it changes how delivery tracking works. Determine if full-text search over message history is expected.
Sketch the main components: client applications (mobile and web), an API gateway layer, a fleet of WebSocket gateway servers for persistent connections, a message ingestion service, a durable message log (Kafka partitioned by conversation ID), delivery worker processes, conversation storage (Cassandra or a similar wide-column store), a cache layer (Redis), and a standalone presence service. Trace the message flow: the sender posts a message to the ingestion API, the message is written to the Kafka partition for the conversation, a delivery worker reads from that partition (preserving order), looks up the recipient's active connections in Redis, and pushes the message to each connected device via WebSocket. Show how the acknowledgment and delivery receipt flow back through the system. Illustrate presence updates flowing through a lighter-weight path to the presence service.
Discuss multi-device synchronization: each device stores a watermark of the last message it received, and on reconnection it requests all messages after that watermark. Explain read receipt handling: track per-device cursors and compute the user-level read state as the maximum across devices. Cover offline handling: the client persists outgoing messages locally and retries with exponential backoff once connectivity returns. Address fault tolerance: Kafka replication factor of three ensures durability, Cassandra's tunable consistency guarantees availability during partial outages, and Redis Sentinel or Cluster handles gateway routing failover. Mention monitoring: track message delivery latency percentiles, WebSocket connection churn rate, Kafka consumer lag, and storage hot-partition metrics. Briefly note that horizontal scaling comes from sharding WebSocket gateways by user hash, Kafka partitions by conversation ID, and Cassandra's native token-ring sharding.
Walk through a single message lifecycle in detail. The sender generates a UUID idempotency key and a local sequence number, then posts the message to the ingestion API. The API validates the request, writes the message to the Kafka partition for this conversation (the conversation hash determines the partition), and returns a "sent" acknowledgment to the sender. A delivery worker consumes from that partition in order, writes the message to Cassandra with a composite key of (conversation_id, timestamp), looks up recipient connections in the Redis routing table, and pushes the message to each online device via WebSocket. When the recipient device acknowledges receipt, the worker updates the message state to "delivered" and pushes a delivery receipt back to the sender. On the recipient opening the conversation, a read receipt is generated and propagated the same way. Explain how the idempotency key prevents duplicates during retries and how sequence numbers let devices detect and request gaps after reconnection.