For a full example answer with detailed architecture diagrams and deep dives, see our Design Slack guide. The Slack guide covers real-time messaging architecture, WebSocket connection management, and presence tracking that directly apply to a Messenger-style 1:1 chat system.
Also review the Message Queues, Caching, and Databases building blocks for background on durable message ingestion, session routing, and conversation storage.
Design a messaging system like WhatsApp or Facebook Messenger that supports real-time one-on-one text conversations between users, with message delivery status tracking and user presence features at scale. Users expect instant delivery, correct message ordering, accurate status indicators (sent, delivered, read), and seamless multi-device continuity even on unreliable mobile networks.
The system must handle billions of users exchanging tens of billions of messages daily. The core technical challenges are maintaining persistent bidirectional connections for real-time delivery, ensuring at-least-once delivery semantics with idempotent writes and per-conversation ordering, synchronizing state across multiple devices per user, and building a presence system that scales without overwhelming write capacity. You need to translate ambiguous requirements into a robust architecture with principled tradeoffs between latency, availability, and consistency.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to see how you ensure messages arrive exactly once (from the user's perspective) in the correct order, even with network retries, multiple devices, and distributed infrastructure.
Hints to consider:
Maintaining persistent bidirectional connections for billions of concurrent users while routing messages efficiently is a fundamental challenge. Interviewers probe how you distribute connection state and handle reconnections.
Hints to consider:
When a user has multiple active devices, all must show the same conversation state. Interviewers look for explicit strategies to sync message history and read receipts without creating race conditions.
Hints to consider:
Storing and retrieving message history at scale requires careful data modeling. Interviewers evaluate your choice of storage system, partitioning scheme, and handling of hot conversations.
Hints to consider:
Presence generates enormous write volume but has relaxed consistency requirements. Interviewers want to see how you optimize this high-frequency signal without overloading your system.
Hints to consider:
Begin by confirming scope and constraints. Ask whether the system needs to support group chats or only 1:1 conversations. Verify scale expectations: daily active users, message volume, and read-to-write ratio. Clarify expectations around multimedia (images, videos) versus text-only. Confirm whether end-to-end encryption is required, as this impacts delivery tracking design. Establish latency targets for different operations (send, receive, sync). Determine if search functionality for message history is in scope.
Sketch the core components: client applications (mobile, web), an API gateway layer, WebSocket gateway servers for persistent connections, a message ingestion service, a message queue (Kafka partitioned by conversation ID), message delivery workers, conversation storage (Cassandra for write throughput and range scans), a cache layer (Redis for session routing and recent messages), and a presence service. Draw the message flow: sender device posts message to the ingestion API, which writes to the appropriate Kafka partition; a delivery worker consumes from the partition, looks up the recipient's connected gateway in Redis, and pushes the message via WebSocket; acknowledgments flow back through the system. Presence updates travel through a separate lightweight path.
Walk through how a message flows from sender to recipient. The sender generates a UUID idempotency key and local sequence number, posts to the ingestion API which validates the request and writes to a Kafka partition keyed by conversation hash. A delivery worker consumes from the partition (maintaining per-conversation order), looks up recipient device connections in the Redis routing table, and pushes the message to each connected device via WebSocket. The worker stores the message in Cassandra with (conversation_id, timestamp) as the composite key. Retries use the idempotency key to prevent duplicates, and sequence numbers let devices detect and request missing messages during reconnection. Discuss how this pipeline handles the case where the recipient is offline: the message is stored and a push notification is sent; when the recipient comes online, their device fetches all messages after its last watermark.
Discuss multi-device sync: each device maintains a watermark of the last message it received; on reconnection, devices request all messages after their watermark. Explain read receipt handling: track read status per device and use the maximum across devices as user-level read state. Cover presence: maintain an in-memory map of user-to-status in the presence service, batch updates every 5 seconds, and fan out only to subscribed contacts. Address offline handling: the client queues messages locally and retries with exponential backoff when the connection is restored. Mention monitoring: track message delivery latency percentiles, connection churn rate, Kafka consumer lag, and storage hot partitions. Discuss horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, and leverage Cassandra's native sharding.