For a full example answer with detailed architecture diagrams and deep dives, see our Design Slack guide. The Slack guide covers real-time messaging architecture, WebSocket connection management, and presence tracking that directly apply to a Messenger-style chat application.
Also review the Message Queues, Caching, and Databases building blocks for background on durable message ingestion, session routing, and conversation storage.
Design a messaging system like WhatsApp or Facebook Messenger that supports real-time 1:1 chat, message delivery status tracking, and user presence features at scale. Users expect instant delivery, correct ordering, accurate status indicators (sent, delivered, read), and seamless multi-device continuity, even with unreliable mobile networks.
The system must handle billions of users exchanging tens of billions of messages daily with sub-200ms delivery latency. The core technical challenges are designing a low-latency, highly available message delivery pipeline with at-least-once semantics, handling fan-in and fan-out patterns for connection management, synchronizing conversation state across multiple devices per user, and building a scalable presence system. Interviewers probe your ability to translate ambiguous requirements into a robust architecture with principled tradeoffs and pragmatic scaling strategies.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to see how you ensure messages arrive in the correct order without duplicates, even when the network is unreliable and the sender has multiple devices.
Hints to consider:
Maintaining persistent connections for billions of concurrent users while routing messages to the correct gateway server is a fundamental challenge.
Hints to consider:
All of a user's devices must show the same conversation state. Interviewers look for concrete mechanisms rather than vague "sync" descriptions.
Hints to consider:
Message storage must handle enormous write throughput with efficient range reads for history. Interviewers evaluate your data modeling and partitioning decisions.
Hints to consider:
Presence tracking generates high write volume with relaxed consistency needs. Interviewers want to see how you avoid overloading the system with heartbeat writes.
Hints to consider:
Confirm scope and constraints. Ask whether group chats are in scope or only 1:1 conversations. Verify scale expectations: daily active users, message volume, and peak-to-average ratio. Clarify whether multimedia support (images, videos) is required or text-only. Confirm whether end-to-end encryption is needed, as it impacts delivery tracking. Establish latency targets for send, receive, and sync operations. Determine if message search is in scope.
Sketch the core components: client applications (mobile, web), an API gateway, WebSocket gateway servers for persistent connections, a message ingestion service, Kafka (partitioned by conversation ID), message delivery workers, conversation storage (Cassandra), a cache layer (Redis for session routing and recent messages), and a presence service. Draw the message flow: sender posts message to the ingestion API, the message is written to the appropriate Kafka partition, a delivery worker consumes and pushes to the recipient's WebSocket connection(s) via the routing table in Redis, acknowledgments flow back. Presence updates travel a separate lightweight path through the dedicated presence service.
Walk through the lifecycle of a single message. The sender generates a UUID idempotency key, posts to the ingestion API which validates and writes to a Kafka partition keyed by conversation hash. The delivery worker consumes from the partition (maintaining per-conversation order), looks up recipient device connections in Redis, and pushes the message to each connected device via WebSocket. The message is stored in Cassandra with (conversation_id, timestamp) as the composite key. If the recipient is offline, the message is stored and a push notification is triggered. On reconnection, the device fetches messages after its watermark. Retries use the idempotency key to prevent duplicates.
Cover multi-device sync: each device tracks its watermark and fetches missing messages on reconnect. Discuss read receipts: track per-device, aggregate to per-user using the maximum cursor. Address presence: batch updates every 5 seconds, maintain in-memory state, persist periodically. Cover monitoring: track message delivery latency percentiles, WebSocket connection churn, Kafka consumer lag, and Cassandra write latency. Discuss scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, leverage Cassandra's built-in sharding, and use Redis cluster mode for session routing at scale.
"Design text messaging app -- 1:1 communication, text only, no E2E encryption. Support multiple clients, offline delivery, 5 years of history. Scale: 100M DAU, 100 messages/day/user."