Design Messenger/Chat Application — Anthropic

Problem Statement

You are tasked with designing a mobile and web messaging platform similar to WhatsApp, Telegram, or Meta Messenger. The system must support real-time one-on-one text conversations between users, allowing them to send and receive messages instantly, track delivery and read status, and see when contacts are online. Users expect their conversation history to sync seamlessly across multiple devices (phone, tablet, desktop) and to be able to send messages even when temporarily offline.

Your design must handle billions of users exchanging tens of billions of messages daily, with strict latency requirements (messages delivered in under 200ms when both parties are online) and high availability expectations (99.95% uptime). The system should gracefully handle network partitions, device failures, and bursty traffic patterns while maintaining message ordering and preventing duplicates.

Key Requirements

Functional

One-to-one messaging -- users can send text messages to any contact in their network with real-time delivery
Delivery tracking -- clear indication of message states: sent (left device), delivered (reached recipient device), and read (opened by recipient)
Multi-device support -- conversations and message states remain consistent across all of a user's active devices
Offline messaging -- users can compose and queue messages when disconnected; messages send automatically when connectivity resumes
Presence indicators -- display whether contacts are currently online or show their last-seen timestamp

Non-Functional

Scalability -- support 2 billion monthly active users sending 100 billion messages per day with peak traffic 3x average
Reliability -- guarantee at-least-once delivery with no message loss; tolerate datacenter failures and network partitions
Latency -- deliver messages end-to-end in under 200ms for online users; presence updates propagate within 500ms
Consistency -- maintain strict message ordering within each conversation; eventual consistency acceptable for presence and read receipts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Architecture and Ordering Guarantees

Interviewers want to see how you ensure messages arrive exactly once (from the user's perspective) in the correct order, even with network retries, multiple devices, and distributed infrastructure. This tests your understanding of distributed systems fundamentals and idempotency.

Hints to consider:

Assign monotonically increasing sequence numbers per conversation to establish total ordering
Use client-generated idempotency keys (UUIDs) to deduplicate retries at the server
Partition message queues by conversation ID to preserve ordering guarantees
Implement acknowledgment protocols where sender confirms receipt from infrastructure, and infrastructure confirms delivery to recipient

2. Real-Time Connection Management and Scaling WebSocket Infrastructure

A core challenge is maintaining persistent bidirectional connections for billions of concurrent users while routing messages efficiently. Interviewers probe how you distribute connection state, handle reconnections, and avoid single points of failure.

Hints to consider:

Deploy a fleet of stateful WebSocket gateway servers with consistent hashing to distribute connections
Store user-to-gateway routing information in a fast lookup service (Redis) with TTL-based cleanup
Implement heartbeat protocols to detect dead connections and graceful reconnection with sequence-number-based catch-up
Use a pub/sub layer (Kafka, Redis Streams) to decouple message ingestion from connection fan-out

3. Multi-Device Synchronization and Conflict Resolution

When a user has multiple active devices, all must show the same conversation state. Interviewers look for explicit strategies to sync message history, read receipts, and typing indicators without creating race conditions or inconsistencies.

Hints to consider:

Maintain per-device read cursors (last seen message ID) in the database to track individual device state
Use the maximum read cursor across all devices as the authoritative "read-by-user" timestamp
Implement a sync protocol where devices fetch missing messages based on their last known sequence number
Handle delivery receipts at the conversation level while tracking read receipts per device

4. Storage Strategy for Conversation History

Storing and retrieving message history efficiently at scale requires careful data modeling. Interviewers evaluate your ability to choose appropriate storage systems, partition data, and handle hot conversations without performance degradation.

Hints to consider:

Model conversations as append-only logs with message ID, timestamp, sender, and content
Partition by conversation ID and use time-based bucketing (monthly/yearly) for archival
Cache recent conversation windows (last 100 messages) in faster storage (Redis) with TTL eviction
Implement pagination cursors for history retrieval rather than offset-based queries

5. Presence System and Last-Seen Tracking

Presence (online/offline status) generates enormous write volume but has relaxed consistency requirements. Interviewers want to see how you optimize this high-frequency, low-value signal without overloading your system.

Hints to consider:

Batch presence updates with a time window (5-10 seconds) to reduce write amplification
Use a separate presence service with in-memory state and periodic persistence
Implement subscription-based presence: only fan-out status to users who have the person in their contact list
Accept eventual consistency for presence (delays of seconds are tolerable)

Suggested Approach

Step 1: Clarify Requirements

Begin by confirming the scope and constraints. Ask whether the system needs to support group chats or only one-to-one conversations. Verify the scale expectations: how many daily active users, what message volume, and what read-to-write ratio. Clarify expectations around multimedia (images, videos) versus text-only. Confirm whether end-to-end encryption is required, as this impacts delivery tracking. Establish latency targets for different operations (send, receive, sync). Determine if search functionality for message history is in scope.

Step 2: High-Level Architecture

Sketch the core components: client applications (mobile, web), API gateway layer, WebSocket gateway servers for persistent connections, message ingestion service, message queue/log (Kafka), message delivery workers, conversation storage (Cassandra or similar), cache layer (Redis), and presence service. Draw the message flow: sender device posts message to API, message ingested into Kafka partitioned by conversation ID, delivery worker reads from Kafka and pushes to recipient's WebSocket connection(s), acknowledgments flow back through the system. Show how presence updates flow through a separate lightweight path.

Step 3: Deep Dive on Key Area

Focus on the message delivery pipeline and ordering guarantees. Walk through how a message flows from sender to recipient: sender generates UUID idempotency key and local sequence number, posts to ingestion API which validates and writes to Kafka partition based on conversation hash, delivery worker consumes from partition (maintaining order), looks up recipient device connections in Redis routing table, pushes to each connected device via WebSocket, stores message in Cassandra with (conversation_id, timestamp) as composite key. Explain how retries use the idempotency key to prevent duplicates, and how sequence numbers let devices detect and request missing messages during reconnection.

Step 4: Address Secondary Concerns

Discuss multi-device sync: each device maintains a watermark of the last message it received; on reconnection, devices request all messages after their watermark. Explain read receipt handling: track read status per device, use maximum across devices as user-level read state. Cover presence: maintain in-memory map of user-to-status in presence service, batch updates every 5 seconds, fan-out only to subscribed contacts. Address offline handling: client queues messages locally, retries with exponential backoff when connection restored. Mention monitoring: track message delivery latency percentiles, connection churn rate, queue lag, and storage hot partitions. Discuss horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, use Cassandra's native sharding.

Problem Statement

Key Requirements

Functional

One-to-one messaging -- users can send text messages to any contact in their network with real-time delivery
Delivery tracking -- clear indication of message states: sent (left device), delivered (reached recipient device), and read (opened by recipient)
Multi-device support -- conversations and message states remain consistent across all of a user's active devices
Offline messaging -- users can compose and queue messages when disconnected; messages send automatically when connectivity resumes
Presence indicators -- display whether contacts are currently online or show their last-seen timestamp

Non-Functional

Scalability -- support 2 billion monthly active users sending 100 billion messages per day with peak traffic 3x average
Reliability -- guarantee at-least-once delivery with no message loss; tolerate datacenter failures and network partitions
Latency -- deliver messages end-to-end in under 200ms for online users; presence updates propagate within 500ms
Consistency -- maintain strict message ordering within each conversation; eventual consistency acceptable for presence and read receipts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Architecture and Ordering Guarantees

Hints to consider:

Assign monotonically increasing sequence numbers per conversation to establish total ordering
Use client-generated idempotency keys (UUIDs) to deduplicate retries at the server
Partition message queues by conversation ID to preserve ordering guarantees
Implement acknowledgment protocols where sender confirms receipt from infrastructure, and infrastructure confirms delivery to recipient

2. Real-Time Connection Management and Scaling WebSocket Infrastructure

Hints to consider:

Deploy a fleet of stateful WebSocket gateway servers with consistent hashing to distribute connections
Store user-to-gateway routing information in a fast lookup service (Redis) with TTL-based cleanup
Implement heartbeat protocols to detect dead connections and graceful reconnection with sequence-number-based catch-up
Use a pub/sub layer (Kafka, Redis Streams) to decouple message ingestion from connection fan-out

3. Multi-Device Synchronization and Conflict Resolution

Hints to consider:

Maintain per-device read cursors (last seen message ID) in the database to track individual device state
Use the maximum read cursor across all devices as the authoritative "read-by-user" timestamp
Implement a sync protocol where devices fetch missing messages based on their last known sequence number
Handle delivery receipts at the conversation level while tracking read receipts per device

4. Storage Strategy for Conversation History

Hints to consider:

Model conversations as append-only logs with message ID, timestamp, sender, and content
Partition by conversation ID and use time-based bucketing (monthly/yearly) for archival
Cache recent conversation windows (last 100 messages) in faster storage (Redis) with TTL eviction
Implement pagination cursors for history retrieval rather than offset-based queries

5. Presence System and Last-Seen Tracking

Hints to consider:

Batch presence updates with a time window (5-10 seconds) to reduce write amplification
Use a separate presence service with in-memory state and periodic persistence
Implement subscription-based presence: only fan-out status to users who have the person in their contact list
Accept eventual consistency for presence (delays of seconds are tolerable)