Design Messenger/Chat Application — MongoDB

Problem Statement

Design a messaging system like WhatsApp or Meta Messenger that supports real-time 1:1 chat, message delivery and read status tracking, user presence indicators, and seamless multi-device synchronization. Users expect to send a text message and have it appear on the recipient's screen within a fraction of a second, with clear visual feedback showing whether the message was sent, delivered, and read. The platform must also support offline messaging, where messages composed without connectivity are queued locally and transmitted once the network is restored.

At MongoDB scale, the interviewer wants to see how you model conversation data for high write throughput and efficient retrieval, how you route messages through a fleet of WebSocket servers without creating single points of failure, and how you keep multiple devices per user in sync without duplicating or losing messages. You should be prepared to discuss partitioning strategies that keep per-conversation ordering intact, caching layers for recent messages, and fan-out mechanics for delivery acknowledgments.

Key Requirements

Functional

One-to-one messaging -- users send and receive text messages in real time with sub-second delivery when both parties are online
Delivery and read receipts -- each message transitions through sent, delivered, and read states visible to the sender
Multi-device sync -- conversation history and receipt states stay consistent across a user's phone, tablet, and desktop
Offline support -- messages composed offline are queued on-device and delivered automatically when connectivity returns
Presence and last seen -- users see whether contacts are currently online or their most recent activity timestamp

Non-Functional

Scalability -- handle billions of daily messages across hundreds of millions of active users with graceful scaling during peak hours
Latency -- deliver messages end-to-end in under 200ms for online recipients; presence updates propagate within 500ms
Reliability -- guarantee at-least-once delivery with no message loss across datacenter failures and network partitions
Ordering -- maintain strict message ordering within each conversation while accepting eventual consistency for presence and receipts

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Message Delivery Pipeline and Ordering Guarantees

Interviewers want to understand how you move a message from sender to recipient reliably and in order, even under retries, network failures, and multi-device scenarios. This tests your grasp of distributed delivery semantics and idempotency.

Hints to consider:

Assign monotonically increasing sequence numbers per conversation so receivers can detect gaps and request retransmission
Require client-generated idempotency keys (UUIDs) on every send so the server deduplicates retries without double-delivering
Partition a durable message log (Kafka) by conversation ID to preserve per-conversation ordering while distributing load
Implement a two-phase acknowledgment flow: the server acknowledges receipt from the sender, then confirms delivery to the recipient's device

2. WebSocket Infrastructure and Connection Routing

Maintaining persistent bidirectional connections for hundreds of millions of concurrent users is a defining challenge. Interviewers probe how you distribute connections, handle failover, and route messages to the correct gateway.

Hints to consider:

Run a fleet of WebSocket gateway servers and register each user-to-gateway mapping in Redis with TTL-based expiry
Use consistent hashing on user ID to prefer a specific gateway, reducing cross-server lookups during message routing
Implement heartbeat-based liveness checks so dead connections are detected within seconds and cleaned up from the routing table
Decouple message ingestion from delivery fan-out using a pub/sub layer so gateway failures do not block upstream processing

3. Multi-Device Synchronization

When a user operates multiple devices simultaneously, all must converge on the same conversation state. Interviewers look for explicit per-device cursors, sync protocols, and deterministic conflict handling.

Hints to consider:

Store a per-device read cursor (last acknowledged message sequence number) so each device tracks its own position independently
Define the user-level read state as the maximum cursor across all of the user's devices
On reconnection, devices request all messages after their last-known sequence number, receiving a compact catch-up batch
Handle the edge case where a user reads a message on their phone but their desktop has not yet received the delivery acknowledgment

4. Conversation Storage and Data Modeling

Storing and querying billions of messages efficiently requires careful schema choices. Interviewers evaluate your ability to pick the right storage engine, partition effectively, and avoid hot partitions.

Hints to consider:

Model each conversation as an append-only log partitioned by conversation ID so all messages in a chat are co-located
Use a wide-column store like Cassandra or a sharded document store with (conversation_id, timestamp) as the composite key for efficient range scans
Cache the most recent message window (last 50-100 messages) in Redis to serve the common "open chat" query without hitting disk
Implement cursor-based pagination for history retrieval rather than offset-based queries to avoid inconsistencies when new messages arrive

5. Presence System Design

Presence generates enormous write volume (every online/offline transition, every heartbeat) but tolerates relaxed consistency. Interviewers want to see how you optimize this high-frequency signal without overwhelming storage.

Hints to consider:

Batch presence heartbeats into coarser time windows (every 5-10 seconds) to reduce write amplification
Run a dedicated presence service with in-memory state, persisting only on status transitions (online to offline) rather than every heartbeat
Fan out presence updates only to users who have the person in their active contact or conversation list, not globally
Accept eventual consistency: a few seconds of stale presence is tolerable and dramatically reduces system load

Suggested Approach

Step 1: Clarify Requirements

Confirm the scope with the interviewer. Ask whether group messaging is in scope or only 1:1 conversations. Clarify the expected scale: daily active users, messages per day, and peak-to-average traffic ratio. Determine whether multimedia (images, voice) is required or text-only. Ask about end-to-end encryption implications on delivery tracking. Establish latency SLAs for message delivery and presence. Confirm whether message search or history retention policies are in scope.

Step 2: High-Level Architecture

Sketch the core components: client apps (mobile, web, desktop), an API gateway for authentication and HTTP endpoints, a fleet of WebSocket gateway servers for persistent connections, a message ingestion service, Kafka partitioned by conversation ID for durable ordered delivery, delivery worker consumers, a conversation store (Cassandra or sharded MongoDB), Redis for connection routing and recent message caching, and a separate presence service. Trace the message flow end-to-end: sender posts message, ingestion service validates and writes to Kafka, delivery worker reads from the partition, looks up recipient gateway in Redis, pushes via WebSocket, and stores the message in the conversation store. Show the acknowledgment flow back to the sender.

Step 3: Deep Dive on Message Delivery

Walk through the critical path in detail. The sender generates a UUID idempotency key and includes the last-known sequence number. The ingestion service validates the payload, assigns a server-side sequence number, writes to the Kafka partition keyed by conversation ID, and returns a "sent" acknowledgment to the sender. A delivery worker consumes from the partition, queries Redis for the recipient's connected gateway servers, and pushes the message to each device. When the recipient's device receives the message, it sends a "delivered" acknowledgment back through its gateway, which the delivery worker propagates to the sender. Explain how retries use the idempotency key to prevent duplicates, and how the sequence number lets reconnecting devices request exactly the messages they missed.

Step 4: Address Secondary Concerns

Cover multi-device sync: each device maintains a watermark of the last received sequence number; on reconnection it requests a catch-up batch from the conversation store. Explain read receipt handling: the client sends a "read" event referencing the latest read sequence number, which updates the per-device cursor and fans out to the sender. Discuss presence: maintain an in-memory user-status map in the presence service, persist transitions to Redis, and fan out only to subscribed contacts. Address offline handling: the client queues messages locally and retries with exponential backoff on reconnection, deduplicating via idempotency keys. Mention monitoring: track delivery latency percentiles, Kafka consumer lag, WebSocket connection churn, and storage hot partition detection.

Related Learning

Deepen your understanding of the patterns used in this problem:

Slack -- covers real-time messaging architecture, WebSocket fan-out, and channel-based delivery patterns directly applicable to 1:1 chat
Message Queues -- Kafka-based durable ordered delivery is the backbone of the message ingestion and fan-out pipeline
Caching -- Redis powers connection routing, recent message caches, and presence state for low-latency lookups
Databases -- choosing the right storage engine and partition key for append-only conversation logs at massive write throughput