Design Messenger/Chat Application — Freelancer

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Design Slack guide. The Slack guide covers real-time messaging architecture, WebSocket connection management, and presence tracking that directly apply to a Messenger-style 1:1 chat system.

Also review the Message Queues, Caching, and Databases building blocks for background on durable message ingestion, session routing, and conversation storage.

Problem Statement

Design a messaging system like WhatsApp or Facebook Messenger that supports real-time one-on-one text conversations between users, with message delivery status tracking and user presence features at scale. Users expect instant delivery, correct message ordering, accurate status indicators (sent, delivered, read), and seamless multi-device continuity even on unreliable mobile networks.

The system must handle billions of users exchanging tens of billions of messages daily. The core technical challenges are maintaining persistent bidirectional connections for real-time delivery, ensuring at-least-once delivery semantics with idempotent writes and per-conversation ordering, synchronizing state across multiple devices per user, and building a presence system that scales without overwhelming write capacity. You need to translate ambiguous requirements into a robust architecture with principled tradeoffs between latency, availability, and consistency.

Key Requirements

Functional

One-to-one messaging -- users can send text messages to contacts with near real-time delivery and clear sent, delivered, and read status indicators
Multi-device synchronization -- conversations and message states remain consistent across all of a user's active devices, with catch-up sync on reconnection
Offline messaging -- users can compose and queue messages when disconnected, with automatic retry and delivery when connectivity returns
Presence indicators -- display whether contacts are currently online or show their last-seen timestamp

Non-Functional

Scalability -- support 2 billion monthly active users sending 100 billion messages per day with peak traffic at 3x average
Reliability -- guarantee at-least-once delivery with no message loss; tolerate datacenter failures and network partitions
Latency -- deliver messages end-to-end in under 200ms for online users; presence updates propagate within 500ms
Consistency -- maintain strict message ordering within each conversation; eventual consistency acceptable for presence and read receipts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering Guarantees

Interviewers want to see how you ensure messages arrive exactly once (from the user's perspective) in the correct order, even with network retries, multiple devices, and distributed infrastructure.

Hints to consider:

Assign monotonically increasing sequence numbers per conversation to establish total ordering within that thread
Use client-generated idempotency keys (UUIDs) so the server can deduplicate retries without creating duplicate messages
Partition message queues by conversation ID to preserve ordering guarantees while enabling parallel processing across conversations
Implement a three-phase acknowledgment protocol: sender-to-server, server-to-recipient, and recipient read confirmation

2. WebSocket Connection Management at Scale

Maintaining persistent bidirectional connections for billions of concurrent users while routing messages efficiently is a fundamental challenge. Interviewers probe how you distribute connection state and handle reconnections.

Hints to consider:

Deploy a fleet of WebSocket gateway servers with consistent hashing to distribute connections, storing user-to-gateway mappings in Redis with TTL-based cleanup
Implement heartbeat protocols to detect dead connections and graceful reconnection with sequence-number-based catch-up to avoid message gaps
Use a pub/sub layer (Kafka or Redis Streams) to decouple message ingestion from connection-level fan-out, allowing gateway servers to scale independently
Handle the handoff scenario where a user switches from mobile to desktop mid-conversation without losing in-flight messages

3. Multi-Device Synchronization

When a user has multiple active devices, all must show the same conversation state. Interviewers look for explicit strategies to sync message history and read receipts without creating race conditions.

Hints to consider:

Maintain per-device read cursors (last seen message ID) in the database to track individual device state independently
Use the maximum read cursor across all devices as the authoritative "read-by-user" timestamp shown to the sender
On reconnection, devices request all messages after their watermark, using pagination to handle long offline periods efficiently
Handle delivery receipts at the conversation level while tracking read receipts per device to avoid conflicting states

4. Storage Strategy for Conversation History

Storing and retrieving message history at scale requires careful data modeling. Interviewers evaluate your choice of storage system, partitioning scheme, and handling of hot conversations.

Hints to consider:

Model conversations as append-only logs with composite keys (conversation_id, message_timestamp) for efficient range queries
Partition by conversation ID and use time-based bucketing for archival of old messages
Cache recent conversation windows (last 50-100 messages) in Redis with TTL eviction for fast inbox loading
Use cursor-based pagination rather than offset-based queries to maintain consistency as new messages arrive

5. Presence System Design

Presence generates enormous write volume but has relaxed consistency requirements. Interviewers want to see how you optimize this high-frequency signal without overloading your system.

Hints to consider:

Batch presence updates with a time window (5-10 seconds) to reduce write amplification across the fleet
Use a dedicated presence service with in-memory state and periodic persistence rather than writing every heartbeat to durable storage
Implement subscription-based fan-out: only broadcast presence changes to users who have the person in their active contact list
Accept seconds of delay for presence updates since users tolerate slight staleness for online/offline indicators

Suggested Approach

Step 1: Clarify Requirements

Begin by confirming scope and constraints. Ask whether the system needs to support group chats or only 1:1 conversations. Verify scale expectations: daily active users, message volume, and read-to-write ratio. Clarify expectations around multimedia (images, videos) versus text-only. Confirm whether end-to-end encryption is required, as this impacts delivery tracking design. Establish latency targets for different operations (send, receive, sync). Determine if search functionality for message history is in scope.

Step 2: High-Level Architecture

Sketch the core components: client applications (mobile, web), an API gateway layer, WebSocket gateway servers for persistent connections, a message ingestion service, a message queue (Kafka partitioned by conversation ID), message delivery workers, conversation storage (Cassandra for write throughput and range scans), a cache layer (Redis for session routing and recent messages), and a presence service. Draw the message flow: sender device posts message to the ingestion API, which writes to the appropriate Kafka partition; a delivery worker consumes from the partition, looks up the recipient's connected gateway in Redis, and pushes the message via WebSocket; acknowledgments flow back through the system. Presence updates travel through a separate lightweight path.

Step 3: Deep Dive on Message Delivery

Walk through how a message flows from sender to recipient. The sender generates a UUID idempotency key and local sequence number, posts to the ingestion API which validates the request and writes to a Kafka partition keyed by conversation hash. A delivery worker consumes from the partition (maintaining per-conversation order), looks up recipient device connections in the Redis routing table, and pushes the message to each connected device via WebSocket. The worker stores the message in Cassandra with (conversation_id, timestamp) as the composite key. Retries use the idempotency key to prevent duplicates, and sequence numbers let devices detect and request missing messages during reconnection. Discuss how this pipeline handles the case where the recipient is offline: the message is stored and a push notification is sent; when the recipient comes online, their device fetches all messages after its last watermark.

Step 4: Address Secondary Concerns

Discuss multi-device sync: each device maintains a watermark of the last message it received; on reconnection, devices request all messages after their watermark. Explain read receipt handling: track read status per device and use the maximum across devices as user-level read state. Cover presence: maintain an in-memory map of user-to-status in the presence service, batch updates every 5 seconds, and fan out only to subscribed contacts. Address offline handling: the client queues messages locally and retries with exponential backoff when the connection is restored. Mention monitoring: track message delivery latency percentiles, connection churn rate, Kafka consumer lag, and storage hot partitions. Discuss horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, and leverage Cassandra's native sharding.

Related Learning

Slack -- real-time messaging architecture with WebSocket fanout and presence tracking
Message Queues -- Kafka for durable, ordered message ingestion and delivery decoupling
Caching -- Redis for session routing, recent message caching, and presence state
Databases -- Cassandra for high-throughput conversation timeline storage
Load Balancers -- distributing WebSocket connections across gateway servers