Design Messenger/Chat Application — Crusoe

Problem Statement

Design a real-time messaging platform like WhatsApp or Meta Messenger that supports one-to-one text conversations, message delivery status tracking (sent, delivered, read), and user presence indicators (online/last seen). Users expect their conversations to sync seamlessly across multiple devices -- phone, tablet, and desktop -- and to be able to compose messages while offline with automatic delivery when connectivity returns.

The system must handle billions of users exchanging tens of billions of messages daily with strict latency targets (under 200ms end-to-end when both parties are online). The core engineering challenges are maintaining persistent bidirectional connections at massive scale, guaranteeing at-least-once delivery with deduplication, preserving message ordering within each conversation, synchronizing state across multiple devices per user, and gracefully handling the reality of unreliable mobile networks where users constantly disconnect and reconnect.

Key Requirements

Functional

One-to-one messaging -- users can send and receive text messages with near real-time delivery and clear sent/delivered/read status indicators
Multi-device synchronization -- conversation history and message states remain consistent across all of a user's active devices
Offline messaging -- users can compose and queue messages when disconnected; messages send automatically when connectivity returns
Presence indicators -- users can see whether contacts are currently online or their last-seen timestamp
Contact management -- users can search for, add, block, and manage contacts to organize their conversations

Non-Functional

Scalability -- support 2 billion monthly active users sending 100 billion messages per day with peak traffic at 3x average
Reliability -- guarantee at-least-once delivery with no message loss; tolerate datacenter failures and network partitions
Latency -- deliver messages end-to-end in under 200ms for online users; presence updates propagate within 500ms
Consistency -- strict message ordering within each conversation; eventual consistency acceptable for presence and read receipts across devices

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering

Interviewers want to see how messages flow from sender to recipient with guaranteed ordering and without duplicates. This tests your understanding of distributed systems fundamentals, idempotency, and partitioned processing.

Hints to consider:

Assign monotonically increasing sequence numbers per conversation to establish total ordering within a chat thread
Use client-generated UUIDs as idempotency keys so the server can detect and discard duplicate submissions from network retries
Partition message processing by conversation ID so all messages in a conversation are handled sequentially, preserving order
Implement a three-phase acknowledgment protocol: sender to server (sent), server to recipient device (delivered), recipient opens conversation (read)

2. WebSocket Infrastructure at Scale

Maintaining billions of persistent bidirectional connections requires careful architecture. Interviewers probe how you distribute connection state, handle reconnections efficiently, and route messages to the correct gateway server.

Hints to consider:

Deploy a fleet of WebSocket gateway servers with consistent hashing to distribute user connections predictably
Store user-to-gateway routing information in Redis with TTL-based cleanup so message delivery workers know where to route outbound messages
Implement heartbeat protocols with configurable intervals to detect dead connections promptly and clean up server-side state
On reconnection, the client sends its last-known sequence number; the gateway fetches missed messages from the conversation log and delivers them before switching to live push

3. Multi-Device Synchronization

When a user has a phone, tablet, and desktop all active, every device must show the same conversation state. Interviewers look for explicit sync strategies and cursor management.

Hints to consider:

Maintain per-device read cursors (the ID of the last message each device has received) in the database
Use the maximum read cursor across all devices as the authoritative "read by user" timestamp for delivery receipts
On connection, each device requests messages after its cursor; the server streams the gap from storage before switching to live push
Delivery receipts are tracked at the conversation level (one "delivered" event per message), while read receipts are tracked per device and aggregated

4. Conversation Storage and History Retrieval

Storing and querying billions of conversations efficiently requires thoughtful data modeling. Interviewers evaluate your storage choices, partitioning strategy, and ability to serve both recent chat windows and deep history.

Hints to consider:

Model conversations as append-only logs with composite keys (conversation_id, sequence_number) for efficient range queries
Partition storage by conversation ID so all messages in a thread are co-located, enabling fast sequential reads
Cache the most recent message window (last 50-100 messages) per conversation in Redis for instant loading
Use time-based bucketing or archival policies to move older messages to cheaper storage tiers while keeping recent conversations hot

5. Presence System Design

Presence updates generate enormous write volume (every online/offline transition for billions of users) but have relaxed consistency requirements. Interviewers want to see how you optimize this high-frequency signal without overloading the system.

Hints to consider:

Batch presence updates with a time window (5-10 seconds) rather than publishing on every state change to reduce write amplification
Use a separate in-memory presence service (backed by Redis) that is independent of the message delivery path
Fan out presence updates only to users who have the person in their active contact list or open conversation, not to all platform users
Accept eventual consistency for presence (delays of a few seconds are tolerable) and use last-seen timestamps as a fallback when real-time status is unavailable

Suggested Approach

Step 1: Clarify Requirements

Begin by confirming the scope. Ask whether the system needs group chats or only one-to-one conversations. Verify scale expectations: daily active users, message volume, and peak-to-average ratio. Clarify whether multimedia messages (images, videos) are in scope or text-only. Confirm whether end-to-end encryption is required, as this significantly impacts delivery tracking and server-side processing. Establish latency targets for different operations and whether global multi-region deployment is needed.

Step 2: High-Level Architecture

Sketch the core components: client applications (mobile, web), an API gateway, a fleet of WebSocket gateway servers for persistent connections, a message ingestion service, Kafka for durable ordered message logs partitioned by conversation ID, message delivery workers that consume from Kafka and push to recipients, a conversation storage layer (Cassandra for its write-optimized wide-row model), a Redis cache for recent messages and connection routing, and a separate presence service. Draw the message flow: sender posts to gateway, gateway writes to Kafka, delivery worker consumes and pushes to recipient's gateway, acknowledgments flow back through the system.

Step 3: Deep Dive on Message Delivery

Walk through the end-to-end flow. The sender generates a UUID idempotency key and sends the message through the WebSocket connection. The gateway server validates the request and publishes to the Kafka partition for this conversation (using conversation ID as the partition key to preserve ordering). A delivery worker consumes from the partition, assigns a server-side sequence number, writes the message to Cassandra, and looks up the recipient's connected devices in the Redis routing table. For each connected device, the worker pushes the message through the appropriate gateway server. The recipient's client acknowledges receipt; the worker updates the delivery status. If the recipient is offline, the message remains in storage; when the device reconnects and provides its cursor, the gateway fetches and delivers all missed messages.

Step 4: Address Secondary Concerns

Discuss multi-device sync: each device tracks its own cursor, requests gaps on reconnection, and the server streams missing messages before switching to live push. Cover read receipt handling: when any device opens the conversation, a read event is published; the authoritative read watermark is the maximum across all devices. Address presence: a lightweight service maintains an in-memory map updated by gateway heartbeats, with batched publication to subscribed contacts. Discuss offline handling: the client queues messages locally and retries with exponential backoff on reconnection. Cover monitoring: track message delivery latency percentiles, connection churn rate, Kafka consumer lag, and storage hot partitions. Address horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, and use Cassandra's native sharding.

Related Learning

Deepen your understanding of the patterns used in this problem:

Slack -- real-time messaging architecture with WebSocket fanout, channel-based routing, and presence tracking
Message Queues -- durable ordered logs with Kafka for message ingestion, delivery, and receipt processing
Caching -- Redis for recent message windows, connection routing tables, and presence state
Databases -- wide-column storage with Cassandra for append-only conversation timelines at scale
Load Balancers -- distributing WebSocket connections across gateway servers with sticky routing