For a full example answer with detailed architecture diagrams and deep dives, see our Slack guide.
Design a messaging platform similar to WhatsApp or Meta Messenger that enables users to send and receive text messages in real time. The system should support one-on-one conversations with delivery and read receipts, presence indicators showing when contacts are online, and seamless synchronization of conversation history across multiple devices such as phones, tablets, and desktops.
At Lyft's scale, this question probes your ability to architect a system handling billions of persistent connections and tens of billions of daily messages. The interviewer expects you to reason about how messages flow from sender to recipient through distributed infrastructure, how you maintain ordering guarantees in the face of network failures and retries, and how you keep latency under 200ms for online-to-online message delivery.
Beyond the happy path, consider how users experience the system when connectivity is intermittent. Messages queued offline must send automatically upon reconnection, and devices joining a conversation mid-stream need an efficient catch-up mechanism that avoids re-downloading the entire history.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers expect you to walk through the full lifecycle of a message from sender to recipient and explain how you guarantee exactly-once delivery semantics from the user's perspective, even when network retries and infrastructure failures occur.
Hints to consider:
Maintaining persistent bidirectional connections for billions of concurrent users is a core challenge. Interviewers want to understand how you distribute connection state, route messages to the correct gateway, and handle reconnections gracefully.
Hints to consider:
When a user has multiple active devices, all must converge on the same conversation state. Interviewers look for explicit strategies to synchronize message history, read receipts, and typing indicators without race conditions.
Hints to consider:
Efficiently storing and retrieving message history at massive scale requires careful data modeling. Interviewers evaluate your ability to choose the right storage systems and partition data to avoid hot spots.
Hints to consider:
(conversation_id, message_id) in a wide-column store like CassandraConfirm whether the system needs group chats or only one-on-one conversations. Verify scale expectations: daily active users, message volume, and read-to-write ratio. Clarify whether multimedia support (images, videos) is in scope or text-only. Establish latency targets for send, receive, and sync operations. Ask whether end-to-end encryption is required, as it impacts delivery tracking and server-side storage.
Sketch client applications connecting through an API gateway to WebSocket gateway servers for persistent connections. Behind the gateways, place a message ingestion service writing to Kafka partitioned by conversation ID. Delivery workers consume from Kafka and push messages to recipient connections via the gateway fleet. Include Cassandra for durable message storage, Redis for connection routing and caching recent messages, and a separate lightweight presence service.
Walk through the complete message flow: sender generates a UUID idempotency key and posts to the ingestion API, which validates and writes to the appropriate Kafka partition. A delivery worker consumes the message, looks up recipient device connections in the Redis routing table, and pushes to each connected device via WebSocket. The message is simultaneously persisted to Cassandra with a composite key of (conversation_id, timestamp). Retries use the idempotency key to prevent duplicates. On reconnection, devices request all messages after their last known sequence number to catch up.
Discuss presence management: maintain an in-memory map of online users in the presence service, batch status updates every 5-10 seconds, and fan out only to subscribed contacts. Cover offline handling: the client queues messages locally and retries with exponential backoff on reconnection. Address monitoring: track message delivery latency percentiles, WebSocket connection churn rate, Kafka consumer lag, and Cassandra hot partition detection. Explain horizontal scaling: shard WebSocket gateways by user hash, partition Kafka by conversation ID, and leverage Cassandra's built-in sharding.