Design Messenger/Chat Application — Monzo

Problem Statement

Design a real-time messaging platform similar to WhatsApp or Facebook Messenger that enables users to exchange one-to-one text messages with near-instant delivery. The system must support billions of registered users, track message delivery states (sent, delivered, read), and synchronize conversations across a user's phone, tablet, and desktop. Users expect that messages composed while offline are queued locally and delivered automatically once connectivity returns.

Your design should sustain tens of billions of messages per day, deliver messages end-to-end in under 200 milliseconds for online recipients, and tolerate datacenter outages without losing a single message. You will need to reason carefully about persistent connection management at massive scale, message ordering within conversations, idempotent delivery semantics, and efficient presence tracking for online/offline status.

Key Requirements

Functional

One-to-one messaging -- users send and receive text messages to any contact with real-time delivery when both parties are online
Delivery and read status -- each message displays clear state progression from sent to delivered to read, visible to the sender
Multi-device synchronization -- conversation history and message states stay consistent across all of a user's active devices without manual refresh
Offline message queuing -- users compose and queue messages while disconnected, with automatic retry and delivery when connectivity returns
Presence and last-seen -- contacts see whether a user is currently online or when they were last active

Non-Functional

Scalability -- support 2 billion monthly active users sending 100 billion messages per day, with peak traffic reaching 3x the average
Reliability -- guarantee at-least-once delivery with no message loss, tolerating datacenter failures and network partitions
Latency -- deliver messages end-to-end in under 200ms for online users, with presence updates propagating within 500ms
Consistency -- maintain strict message ordering within each conversation, with eventual consistency acceptable for presence and read receipts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering Guarantees

Interviewers want to understand how you ensure every message arrives exactly once from the user's perspective, in the correct order, even when the network retries, multiple devices are active, and infrastructure is distributed. This tests your grasp of distributed systems fundamentals and idempotency design.

Hints to consider:

Assign monotonically increasing sequence numbers per conversation to establish a total ordering that devices can verify
Have clients generate idempotency keys (UUIDs) so that the server can deduplicate retries without side effects
Partition message processing queues by conversation ID so that all messages for a given conversation are handled sequentially
Design an acknowledgment flow where the sender gets confirmation from infrastructure and infrastructure separately confirms delivery to the recipient

2. WebSocket Infrastructure and Connection Routing

Maintaining persistent bidirectional connections for billions of concurrent users while routing messages efficiently is a defining challenge. Interviewers test whether you can distribute connection state, handle reconnections gracefully, and avoid single points of failure.

Hints to consider:

Deploy a fleet of WebSocket gateway servers with consistent hashing to spread connections evenly across the cluster
Store a mapping from user ID to gateway address in a fast lookup layer (Redis) with TTL-based expiration for stale entries
Implement heartbeat protocols to detect dead connections and a catch-up mechanism where reconnecting devices request messages after their last known sequence number
Introduce a pub/sub system (Kafka or Redis Streams) between ingestion and fan-out to decouple message producers from connection servers

3. Multi-Device Synchronization Strategy

When a user owns a phone, a tablet, and a desktop, every device must converge to the same conversation state. Interviewers look for explicit strategies to synchronize message history, read receipts, and typing indicators without creating race conditions.

Hints to consider:

Maintain a per-device read cursor (last consumed message ID) in the database so each device tracks its own progress independently
Use the maximum read cursor across all of a user's devices as the authoritative "read-by-user" state that gets reported back to the sender
Implement a sync protocol where devices request all messages newer than their last watermark on reconnection, keeping catch-up fast and bounded
Deliver read receipt updates at the conversation level while storing per-device cursors to avoid conflicts

4. Storage Design for Conversation History

Storing and querying message history at enormous scale demands careful data modeling. Interviewers evaluate whether you choose appropriate storage engines, partition data logically, and handle hot conversations without degrading performance for other users.

Hints to consider:

Model each conversation as an append-only log keyed by (conversation_id, message_id) with timestamp, sender, and content fields
Partition by conversation ID and apply time-based bucketing (monthly or yearly) so that old data can be archived without resharding
Cache the most recent window of messages (last 50 to 100) in an in-memory store like Redis to serve the common "open conversation" query quickly
Use cursor-based pagination rather than offset-based queries so that history retrieval remains efficient as conversations grow

5. Presence Service and Last-Seen Tracking

Presence generates high write volume because every user heartbeat triggers an update, yet the data has relaxed consistency requirements. Interviewers want to see how you optimize this frequent, low-priority signal without overloading the rest of the system.

Hints to consider:

Batch presence updates within a window (5 to 10 seconds) to reduce the number of writes, accepting small delays in accuracy
Run a dedicated presence service with in-memory state that periodically persists snapshots, keeping it isolated from the messaging pipeline
Implement subscription-based fan-out so that presence updates only propagate to users whose contact list includes the person in question
Accept eventual consistency for presence, since delays of a few seconds are acceptable to users

Suggested Approach

Step 1: Clarify Requirements

Begin by confirming scope with the interviewer. Ask whether group chats are in scope or only one-to-one conversations. Verify the scale: how many daily active users, expected messages per user per day, and the read-to-write ratio. Clarify whether multimedia (images, video) is required or if text-only is sufficient. Confirm latency targets for sending, receiving, and syncing. Ask whether end-to-end encryption is in scope, as it changes how delivery tracking works. Determine if full-text search over message history is expected.

Step 2: High-Level Architecture

Sketch the main components: client applications (mobile and web), an API gateway layer, a fleet of WebSocket gateway servers for persistent connections, a message ingestion service, a durable message log (Kafka partitioned by conversation ID), delivery worker processes, conversation storage (Cassandra or a similar wide-column store), a cache layer (Redis), and a standalone presence service. Trace the message flow: the sender posts a message to the ingestion API, the message is written to the Kafka partition for the conversation, a delivery worker reads from that partition (preserving order), looks up the recipient's active connections in Redis, and pushes the message to each connected device via WebSocket. Show how the acknowledgment and delivery receipt flow back through the system. Illustrate presence updates flowing through a lighter-weight path to the presence service.

Step 3: Deep Dive on Message Delivery and Ordering

Step 4: Address Secondary Concerns

Discuss multi-device synchronization: each device stores a watermark of the last message it received, and on reconnection it requests all messages after that watermark. Explain read receipt handling: track per-device cursors and compute the user-level read state as the maximum across devices. Cover offline handling: the client persists outgoing messages locally and retries with exponential backoff once connectivity returns. Address fault tolerance: Kafka replication factor of three ensures durability, Cassandra's tunable consistency guarantees availability during partial outages, and Redis Sentinel or Cluster handles gateway routing failover. Mention monitoring: track message delivery latency percentiles, WebSocket connection churn rate, Kafka consumer lag, and storage hot-partition metrics. Briefly note that horizontal scaling comes from sharding WebSocket gateways by user hash, Kafka partitions by conversation ID, and Cassandra's native token-ring sharding.

Problem Statement

Key Requirements

Functional

One-to-one messaging -- users send and receive text messages to any contact with real-time delivery when both parties are online
Delivery and read status -- each message displays clear state progression from sent to delivered to read, visible to the sender
Multi-device synchronization -- conversation history and message states stay consistent across all of a user's active devices without manual refresh
Offline message queuing -- users compose and queue messages while disconnected, with automatic retry and delivery when connectivity returns
Presence and last-seen -- contacts see whether a user is currently online or when they were last active

Non-Functional

Scalability -- support 2 billion monthly active users sending 100 billion messages per day, with peak traffic reaching 3x the average
Reliability -- guarantee at-least-once delivery with no message loss, tolerating datacenter failures and network partitions
Latency -- deliver messages end-to-end in under 200ms for online users, with presence updates propagating within 500ms
Consistency -- maintain strict message ordering within each conversation, with eventual consistency acceptable for presence and read receipts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering Guarantees

Hints to consider:

Assign monotonically increasing sequence numbers per conversation to establish a total ordering that devices can verify
Have clients generate idempotency keys (UUIDs) so that the server can deduplicate retries without side effects
Partition message processing queues by conversation ID so that all messages for a given conversation are handled sequentially
Design an acknowledgment flow where the sender gets confirmation from infrastructure and infrastructure separately confirms delivery to the recipient

2. WebSocket Infrastructure and Connection Routing

Hints to consider:

Deploy a fleet of WebSocket gateway servers with consistent hashing to spread connections evenly across the cluster
Store a mapping from user ID to gateway address in a fast lookup layer (Redis) with TTL-based expiration for stale entries
Implement heartbeat protocols to detect dead connections and a catch-up mechanism where reconnecting devices request messages after their last known sequence number
Introduce a pub/sub system (Kafka or Redis Streams) between ingestion and fan-out to decouple message producers from connection servers

3. Multi-Device Synchronization Strategy

Hints to consider:

Maintain a per-device read cursor (last consumed message ID) in the database so each device tracks its own progress independently
Use the maximum read cursor across all of a user's devices as the authoritative "read-by-user" state that gets reported back to the sender
Implement a sync protocol where devices request all messages newer than their last watermark on reconnection, keeping catch-up fast and bounded
Deliver read receipt updates at the conversation level while storing per-device cursors to avoid conflicts

4. Storage Design for Conversation History

Hints to consider:

Model each conversation as an append-only log keyed by (conversation_id, message_id) with timestamp, sender, and content fields
Partition by conversation ID and apply time-based bucketing (monthly or yearly) so that old data can be archived without resharding
Cache the most recent window of messages (last 50 to 100) in an in-memory store like Redis to serve the common "open conversation" query quickly
Use cursor-based pagination rather than offset-based queries so that history retrieval remains efficient as conversations grow

5. Presence Service and Last-Seen Tracking

Hints to consider:

Batch presence updates within a window (5 to 10 seconds) to reduce the number of writes, accepting small delays in accuracy
Run a dedicated presence service with in-memory state that periodically persists snapshots, keeping it isolated from the messaging pipeline
Implement subscription-based fan-out so that presence updates only propagate to users whose contact list includes the person in question
Accept eventual consistency for presence, since delays of a few seconds are acceptable to users