Design a messaging application similar to WhatsApp or Facebook Messenger that supports real-time one-to-one conversations. Users should be able to send text messages instantly, see delivery and read receipts, check whether contacts are online, and access their full conversation history across multiple devices. The system must handle billions of messages daily while keeping end-to-end delivery latency under 200 milliseconds for online recipients.
The central engineering challenge is building a low-latency, highly available messaging backbone that guarantees at-least-once delivery, maintains per-conversation ordering, synchronizes state across phones, tablets, and desktops, and stores years of chat history without degrading performance. You will need to reason about persistent connection management, message sequencing, multi-device conflict resolution, and presence propagation at internet scale.
Based on real interview experiences at Addepar, Meta, Anthropic, and other companies that ask this question, these are the areas interviewers probe most deeply:
Interviewers want to understand how you guarantee that every message arrives exactly once (from the user's perspective), in the correct order, even under retries, multi-device access, and distributed infrastructure. This tests core distributed-systems reasoning around idempotency and sequencing.
Hints to consider:
Maintaining persistent bidirectional connections for billions of concurrent users and routing messages to the correct gateway is a hallmark challenge. Interviewers probe how you distribute connection state, detect stale connections, and avoid single points of failure.
Hints to consider:
When a user has several active devices, each must converge on the same conversation state. Interviewers look for concrete mechanisms to sync message history, delivery receipts, and read cursors without race conditions.
Hints to consider:
Efficiently storing and retrieving years of chat history for billions of users demands careful data modeling. Interviewers evaluate your storage technology choices, partitioning scheme, and caching strategy.
Hints to consider:
Presence generates enormous write volume (every heartbeat, app foreground/background event) but tolerates relaxed freshness. Interviewers want to see how you optimize this high-frequency signal without overloading your infrastructure.
Hints to consider:
Confirm whether the scope includes group chats or only one-to-one conversations. Verify scale expectations: daily active users, messages per day, read-to-write ratio. Ask about multimedia support (images, video, voice) versus text-only. Clarify whether end-to-end encryption is required, since it changes how the server handles delivery receipts. Establish latency targets for send, receive, and sync operations. Determine if message search is in scope.
Sketch the main components: mobile and web clients, an API gateway, a fleet of WebSocket gateway servers for persistent connections, a message ingestion service, Kafka partitioned by conversation ID as the message log, delivery workers that consume from Kafka and push to recipients, Cassandra for durable conversation storage, Redis for caching recent messages and connection routing, and a separate presence service. Draw the message flow: sender posts to API, message lands in Kafka, delivery worker looks up recipient connections in Redis and pushes via WebSocket, acknowledgments flow back. Show presence updates flowing through a lightweight, separate path.
Walk through an end-to-end message send. The sender's client generates a UUID and local sequence number, then posts the message to the ingestion API. The server validates, deduplicates using the UUID, and writes to the Kafka partition for that conversation. A delivery worker consumes the event, looks up the recipient's active WebSocket gateways in Redis, and pushes the message to each connected device. On receipt, each device sends a delivery acknowledgment back through its gateway, which updates the message state in Cassandra. The sender's client receives a delivery receipt event. Read receipts follow the same path when the recipient opens the conversation. Explain how the UUID prevents duplicates on retry, and how per-conversation Kafka partitioning preserves ordering without cross-conversation coordination.
Discuss multi-device sync: each device maintains a watermark of the last message it received; on reconnection it requests all messages after that watermark. Explain offline handling: the client queues messages locally and retries with exponential backoff when the connection restores. Cover presence: maintain an in-memory user status map, batch heartbeats, fan out only to subscribed contacts. Mention monitoring: track message delivery latency percentiles, WebSocket connection churn, Kafka consumer lag, and Cassandra hot partitions. Discuss horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, and rely on Cassandra's native ring-based sharding.