Design Messenger/Chat Application — HyperFace

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Design Slack guide. The Slack guide covers real-time messaging architecture, WebSocket connection management, and presence tracking that directly apply to a Messenger-style chat application.

Also review the Message Queues, Caching, and Databases building blocks for background on durable message ingestion, session routing, and conversation storage.

Problem Statement

Design a messaging system like WhatsApp or Facebook Messenger that supports real-time 1:1 chat, message delivery status tracking, and user presence features at scale. Users expect instant delivery, correct ordering, accurate status indicators (sent, delivered, read), and seamless multi-device continuity, even with unreliable mobile networks.

The system must handle billions of users exchanging tens of billions of messages daily with sub-200ms delivery latency. The core technical challenges are designing a low-latency, highly available message delivery pipeline with at-least-once semantics, handling fan-in and fan-out patterns for connection management, synchronizing conversation state across multiple devices per user, and building a scalable presence system. Interviewers probe your ability to translate ambiguous requirements into a robust architecture with principled tradeoffs and pragmatic scaling strategies.

Key Requirements

Functional

One-to-one messaging -- users can send text messages with near real-time delivery and clear sent, delivered, and read status indicators
Multi-device synchronization -- conversations and message states remain consistent across all of a user's active devices with catch-up sync
Offline messaging -- users can compose and queue messages when disconnected, with automatic delivery when connectivity returns
Presence indicators -- display whether contacts are currently online or show their last-seen timestamp

Non-Functional

Scalability -- support 2 billion monthly active users sending 100 billion messages per day with 3x peak-to-average traffic ratio
Reliability -- guarantee at-least-once delivery with no message loss; tolerate datacenter failures and network partitions
Latency -- deliver messages end-to-end in under 200ms for online users; presence updates propagate within 500ms
Consistency -- maintain strict message ordering within each conversation; eventual consistency acceptable for presence and read receipts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering

Interviewers want to see how you ensure messages arrive in the correct order without duplicates, even when the network is unreliable and the sender has multiple devices.

Hints to consider:

Assign monotonically increasing sequence numbers per conversation so recipients can detect gaps and request missing messages
Use client-generated UUIDs as idempotency keys to deduplicate retries at the server without creating duplicate messages
Partition message queues by conversation ID to preserve per-conversation ordering while enabling parallel processing
Implement acknowledgment flows: sender-to-server (sent), server-to-recipient (delivered), and recipient-opened (read)

2. WebSocket Infrastructure and Connection Routing

Maintaining persistent connections for billions of concurrent users while routing messages to the correct gateway server is a fundamental challenge.

Hints to consider:

Deploy stateful WebSocket gateway servers with consistent hashing, storing user-to-gateway mappings in Redis with TTL-based cleanup for crash recovery
Implement heartbeat-based liveness detection with reconnection logic that uses sequence numbers to catch up on missed messages
Decouple message ingestion from connection fan-out using Kafka or Redis Streams so gateway servers can scale independently
Handle connection handoffs gracefully when a user switches devices or reconnects to a different gateway

3. Multi-Device Sync Strategy

All of a user's devices must show the same conversation state. Interviewers look for concrete mechanisms rather than vague "sync" descriptions.

Hints to consider:

Track per-device read cursors (last seen message ID) independently in the database
Use the maximum cursor across all devices as the authoritative "read-by-user" timestamp shown to the conversation partner
On reconnection, each device fetches all messages after its stored watermark using paginated queries
Separate delivery receipts (per device) from read receipts (per user) to avoid conflicting status updates

4. Conversation Storage and Hot Partition Handling

Message storage must handle enormous write throughput with efficient range reads for history. Interviewers evaluate your data modeling and partitioning decisions.

Hints to consider:

Use a wide-column store like Cassandra with composite keys (conversation_id, timestamp) for efficient writes and time-range reads
Cache the most recent messages per conversation (last 50-100) in Redis for fast inbox rendering
Implement cursor-based pagination for history retrieval to maintain consistency as new messages arrive during scrolling
Identify and mitigate hot partitions from celebrity or viral conversations by sub-sharding high-traffic conversation IDs

5. Presence System Optimization

Presence tracking generates high write volume with relaxed consistency needs. Interviewers want to see how you avoid overloading the system with heartbeat writes.

Hints to consider:

Batch presence updates over a 5-10 second window to reduce write amplification from individual heartbeats
Run a dedicated presence service with in-memory state and periodic checkpoint persistence rather than writing every heartbeat to durable storage
Fan out presence changes only to contacts who have the user in their active roster, avoiding broadcast to all users
Accept several seconds of staleness for presence indicators since users tolerate slight delays for online/offline status

Suggested Approach

Step 1: Clarify Requirements

Confirm scope and constraints. Ask whether group chats are in scope or only 1:1 conversations. Verify scale expectations: daily active users, message volume, and peak-to-average ratio. Clarify whether multimedia support (images, videos) is required or text-only. Confirm whether end-to-end encryption is needed, as it impacts delivery tracking. Establish latency targets for send, receive, and sync operations. Determine if message search is in scope.

Step 2: High-Level Architecture

Sketch the core components: client applications (mobile, web), an API gateway, WebSocket gateway servers for persistent connections, a message ingestion service, Kafka (partitioned by conversation ID), message delivery workers, conversation storage (Cassandra), a cache layer (Redis for session routing and recent messages), and a presence service. Draw the message flow: sender posts message to the ingestion API, the message is written to the appropriate Kafka partition, a delivery worker consumes and pushes to the recipient's WebSocket connection(s) via the routing table in Redis, acknowledgments flow back. Presence updates travel a separate lightweight path through the dedicated presence service.

Step 3: Deep Dive on Delivery Pipeline

Walk through the lifecycle of a single message. The sender generates a UUID idempotency key, posts to the ingestion API which validates and writes to a Kafka partition keyed by conversation hash. The delivery worker consumes from the partition (maintaining per-conversation order), looks up recipient device connections in Redis, and pushes the message to each connected device via WebSocket. The message is stored in Cassandra with (conversation_id, timestamp) as the composite key. If the recipient is offline, the message is stored and a push notification is triggered. On reconnection, the device fetches messages after its watermark. Retries use the idempotency key to prevent duplicates.

Step 4: Address Secondary Concerns

Cover multi-device sync: each device tracks its watermark and fetches missing messages on reconnect. Discuss read receipts: track per-device, aggregate to per-user using the maximum cursor. Address presence: batch updates every 5 seconds, maintain in-memory state, persist periodically. Cover monitoring: track message delivery latency percentiles, WebSocket connection churn, Kafka consumer lag, and Cassandra write latency. Discuss scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, leverage Cassandra's built-in sharding, and use Redis cluster mode for session routing at scale.

Real Interview Quotes

"Design text messaging app -- 1:1 communication, text only, no E2E encryption. Support multiple clients, offline delivery, 5 years of history. Scale: 100M DAU, 100 messages/day/user."

Related Learning

Slack -- real-time messaging architecture with WebSocket fanout and presence tracking
Message Queues -- Kafka for durable, ordered message ingestion and delivery decoupling
Caching -- Redis for session routing, recent message caching, and presence state
Databases -- Cassandra for high-throughput conversation timeline storage
Load Balancers -- distributing WebSocket connections across gateway servers