Design Messenger/Chat Application — Addepar

Problem Statement

Design a messaging application similar to WhatsApp or Facebook Messenger that supports real-time one-to-one conversations. Users should be able to send text messages instantly, see delivery and read receipts, check whether contacts are online, and access their full conversation history across multiple devices. The system must handle billions of messages daily while keeping end-to-end delivery latency under 200 milliseconds for online recipients.

The central engineering challenge is building a low-latency, highly available messaging backbone that guarantees at-least-once delivery, maintains per-conversation ordering, synchronizes state across phones, tablets, and desktops, and stores years of chat history without degrading performance. You will need to reason about persistent connection management, message sequencing, multi-device conflict resolution, and presence propagation at internet scale.

Key Requirements

Functional

One-to-one messaging -- users send and receive text messages with near-instant delivery when both parties are online
Delivery and read receipts -- each message transitions through sent, delivered, and read states visible to the sender
Multi-device synchronization -- conversation history and receipt states remain consistent across all of a user's logged-in devices
Offline queuing -- messages composed while disconnected are persisted locally and delivered automatically when connectivity resumes
Presence and last-seen -- contacts see whether a user is currently online or the timestamp of their last activity

Non-Functional

Scalability -- support two billion monthly active users exchanging 100 billion messages per day with 3x peak-to-average traffic spikes
Reliability -- guarantee at-least-once delivery with no message loss; tolerate datacenter and availability-zone failures
Latency -- deliver messages end-to-end in under 200 ms for online users; propagate presence within 500 ms
Consistency -- enforce strict per-conversation message ordering; accept eventual consistency for presence and read receipts

What Interviewers Focus On

Based on real interview experiences at Addepar, Meta, Anthropic, and other companies that ask this question, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering Guarantees

Interviewers want to understand how you guarantee that every message arrives exactly once (from the user's perspective), in the correct order, even under retries, multi-device access, and distributed infrastructure. This tests core distributed-systems reasoning around idempotency and sequencing.

Hints to consider:

Assign monotonically increasing sequence numbers per conversation so recipients can detect gaps and request backfills
Have clients attach a UUID idempotency key to each message so the server can deduplicate retries
Partition the message log by conversation ID (for example, in Apache Kafka) to preserve per-conversation ordering while parallelizing across conversations
Implement a two-phase acknowledgment flow: sender receives an ack from infrastructure, and infrastructure later confirms delivery to the recipient

2. WebSocket Infrastructure and Connection Routing

Maintaining persistent bidirectional connections for billions of concurrent users and routing messages to the correct gateway is a hallmark challenge. Interviewers probe how you distribute connection state, detect stale connections, and avoid single points of failure.

Hints to consider:

Run a fleet of WebSocket gateway servers and distribute users via consistent hashing or a connection registry
Store the mapping of user ID to gateway host in Redis with TTL-based expiry so dead connections are cleaned up automatically
Implement heartbeat and ping/pong protocols to detect silent disconnections and trigger reconnection with sequence-number-based catch-up
Place a pub/sub layer (Kafka or Redis Pub/Sub) between message ingestion and WebSocket fan-out to decouple the two

3. Multi-Device Synchronization

When a user has several active devices, each must converge on the same conversation state. Interviewers look for concrete mechanisms to sync message history, delivery receipts, and read cursors without race conditions.

Hints to consider:

Maintain a per-device read cursor (last acknowledged message sequence number) in the database
Treat the maximum cursor across all devices as the authoritative "read by this user" watermark
On reconnection, each device fetches all messages after its own cursor, ensuring no gaps
Propagate receipt updates as lightweight events through the same WebSocket channel used for messages

4. Conversation Storage at Scale

Efficiently storing and retrieving years of chat history for billions of users demands careful data modeling. Interviewers evaluate your storage technology choices, partitioning scheme, and caching strategy.

Hints to consider:

Use Apache Cassandra with a partition key of conversation ID and a clustering key of message timestamp for efficient range scans and high write throughput
Bucket older messages into coarser time partitions (monthly or yearly) and keep recent windows (last 50-100 messages) in Redis for fast inbox loads
Implement cursor-based pagination rather than offset-based queries to avoid performance cliffs on long conversations
Consider tiered storage where messages older than a configurable threshold are compressed and moved to cheaper blob storage

5. Presence System Design

Presence generates enormous write volume (every heartbeat, app foreground/background event) but tolerates relaxed freshness. Interviewers want to see how you optimize this high-frequency signal without overloading your infrastructure.

Hints to consider:

Batch heartbeat writes with a coalescing window of five to ten seconds to reduce write amplification
Run a dedicated presence service with in-memory state (Redis or a custom store) and periodic snapshots to durable storage
Fan out presence changes only to users whose contact list includes the subject, avoiding broadcast storms
Accept eventual consistency -- a few seconds of staleness is perfectly fine for online/offline indicators

Suggested Approach

Step 1: Clarify Requirements

Confirm whether the scope includes group chats or only one-to-one conversations. Verify scale expectations: daily active users, messages per day, read-to-write ratio. Ask about multimedia support (images, video, voice) versus text-only. Clarify whether end-to-end encryption is required, since it changes how the server handles delivery receipts. Establish latency targets for send, receive, and sync operations. Determine if message search is in scope.

Step 2: High-Level Architecture

Sketch the main components: mobile and web clients, an API gateway, a fleet of WebSocket gateway servers for persistent connections, a message ingestion service, Kafka partitioned by conversation ID as the message log, delivery workers that consume from Kafka and push to recipients, Cassandra for durable conversation storage, Redis for caching recent messages and connection routing, and a separate presence service. Draw the message flow: sender posts to API, message lands in Kafka, delivery worker looks up recipient connections in Redis and pushes via WebSocket, acknowledgments flow back. Show presence updates flowing through a lightweight, separate path.

Step 3: Deep Dive on Message Delivery

Walk through an end-to-end message send. The sender's client generates a UUID and local sequence number, then posts the message to the ingestion API. The server validates, deduplicates using the UUID, and writes to the Kafka partition for that conversation. A delivery worker consumes the event, looks up the recipient's active WebSocket gateways in Redis, and pushes the message to each connected device. On receipt, each device sends a delivery acknowledgment back through its gateway, which updates the message state in Cassandra. The sender's client receives a delivery receipt event. Read receipts follow the same path when the recipient opens the conversation. Explain how the UUID prevents duplicates on retry, and how per-conversation Kafka partitioning preserves ordering without cross-conversation coordination.

Step 4: Address Secondary Concerns

Discuss multi-device sync: each device maintains a watermark of the last message it received; on reconnection it requests all messages after that watermark. Explain offline handling: the client queues messages locally and retries with exponential backoff when the connection restores. Cover presence: maintain an in-memory user status map, batch heartbeats, fan out only to subscribed contacts. Mention monitoring: track message delivery latency percentiles, WebSocket connection churn, Kafka consumer lag, and Cassandra hot partitions. Discuss horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, and rely on Cassandra's native ring-based sharding.

Related Learning Resources

Design Slack -- covers real-time messaging infrastructure, channel-based fanout, and WebSocket scaling patterns directly applicable to chat applications
Design ChatGPT -- explores persistent connection management, streaming responses, and conversation storage at scale

Problem Statement

Key Requirements

Functional

One-to-one messaging -- users send and receive text messages with near-instant delivery when both parties are online
Delivery and read receipts -- each message transitions through sent, delivered, and read states visible to the sender
Multi-device synchronization -- conversation history and receipt states remain consistent across all of a user's logged-in devices
Offline queuing -- messages composed while disconnected are persisted locally and delivered automatically when connectivity resumes
Presence and last-seen -- contacts see whether a user is currently online or the timestamp of their last activity

Non-Functional

Scalability -- support two billion monthly active users exchanging 100 billion messages per day with 3x peak-to-average traffic spikes
Reliability -- guarantee at-least-once delivery with no message loss; tolerate datacenter and availability-zone failures
Latency -- deliver messages end-to-end in under 200 ms for online users; propagate presence within 500 ms
Consistency -- enforce strict per-conversation message ordering; accept eventual consistency for presence and read receipts

What Interviewers Focus On

Based on real interview experiences at Addepar, Meta, Anthropic, and other companies that ask this question, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering Guarantees

Hints to consider:

Assign monotonically increasing sequence numbers per conversation so recipients can detect gaps and request backfills
Have clients attach a UUID idempotency key to each message so the server can deduplicate retries
Partition the message log by conversation ID (for example, in Apache Kafka) to preserve per-conversation ordering while parallelizing across conversations
Implement a two-phase acknowledgment flow: sender receives an ack from infrastructure, and infrastructure later confirms delivery to the recipient

2. WebSocket Infrastructure and Connection Routing

Hints to consider:

Run a fleet of WebSocket gateway servers and distribute users via consistent hashing or a connection registry
Store the mapping of user ID to gateway host in Redis with TTL-based expiry so dead connections are cleaned up automatically
Implement heartbeat and ping/pong protocols to detect silent disconnections and trigger reconnection with sequence-number-based catch-up
Place a pub/sub layer (Kafka or Redis Pub/Sub) between message ingestion and WebSocket fan-out to decouple the two

3. Multi-Device Synchronization

Hints to consider:

Maintain a per-device read cursor (last acknowledged message sequence number) in the database
Treat the maximum cursor across all devices as the authoritative "read by this user" watermark
On reconnection, each device fetches all messages after its own cursor, ensuring no gaps
Propagate receipt updates as lightweight events through the same WebSocket channel used for messages

4. Conversation Storage at Scale

Hints to consider:

Use Apache Cassandra with a partition key of conversation ID and a clustering key of message timestamp for efficient range scans and high write throughput
Bucket older messages into coarser time partitions (monthly or yearly) and keep recent windows (last 50-100 messages) in Redis for fast inbox loads
Implement cursor-based pagination rather than offset-based queries to avoid performance cliffs on long conversations
Consider tiered storage where messages older than a configurable threshold are compressed and moved to cheaper blob storage

5. Presence System Design

Hints to consider:

Batch heartbeat writes with a coalescing window of five to ten seconds to reduce write amplification
Run a dedicated presence service with in-memory state (Redis or a custom store) and periodic snapshots to durable storage
Fan out presence changes only to users whose contact list includes the subject, avoiding broadcast storms
Accept eventual consistency -- a few seconds of staleness is perfectly fine for online/offline indicators

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Message Delivery

Step 4: Address Secondary Concerns

Related Learning Resources

Design Slack -- covers real-time messaging infrastructure, channel-based fanout, and WebSocket scaling patterns directly applicable to chat applications
Design ChatGPT -- explores persistent connection management, streaming responses, and conversation storage at scale