Practice/Databricks/Multi-Threaded Chat System
Multi-Threaded Chat System
CodingMust
Problem Statement
Design a chat system similar to Slack that supports basic messaging functionality with a critical focus on deletion capabilities. When a user deletes a conversation, it must be removed for all participants -- both those currently online and those offline. This creates interesting synchronization challenges across distributed clients.
The key distinction from typical chat systems is the emphasis on deletion propagation. Unlike many messaging apps where deletion is local to your view, this system requires global deletion that affects all users in a conversation, creating interesting challenges around consistency, offline sync, and race conditions.
Key Requirements
Functional
- Basic messaging -- users can send and receive messages in conversations and channels
- Individual message deletion -- users can delete specific messages they sent
- Conversation deletion -- users can delete entire conversations, removing them for all participants
- Global deletion propagation -- when one user deletes a conversation, all users (online and offline) must see it deleted
- Offline user synchronization -- deletion events must sync when offline users reconnect, even hours or days later
Non-Functional
- Latency requirements -- clarify whether real-time delivery is required or if eventual consistency is acceptable
- Consistency guarantees -- determine whether to use push notifications for online users or simpler polling mechanisms
- Scalability -- handle millions of users with varying online/offline patterns
- Delivery guarantees -- ensure deletion events are not lost, even for users offline for extended periods
- Race condition handling -- gracefully handle scenarios like deleting a conversation while someone is typing
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Deletion Mechanism for Online vs Offline Users (Most Emphasized)
The core challenge is ensuring deletion propagates to both online and offline users. Interviewers want to see separate mechanisms for each case.
Hints to consider:
- Online users: push deletion events through existing WebSocket connections or long-polling channels
- Offline users: maintain a deletion event log that syncs when they reconnect
- Tombstone pattern: don't hard-delete conversations immediately; mark as deleted with metadata (who deleted, when)
- Sync protocol: when user reconnects, query for deletion events since their last sync timestamp
- Event retention: decide how long to keep tombstone records (what if someone is offline for 6 months?)
- Multiple devices: ensure deletion syncs across all of a user's devices
2. Understanding the Right Level of Complexity
Interviewers will probe whether you're over-engineering the solution. Don't assume WebSocket or real-time architecture without clarifying latency requirements.
Hints to consider:
- Ask about acceptable latency: milliseconds vs seconds vs minutes changes the architecture drastically
- If the interviewer doesn't care about real-time delivery, a simpler polling system may be more appropriate
- Don't spend too much time on WebSocket implementation details if deletion mechanics are the focus
- Save deep dive energy for what the interviewer actually wants to explore
- Watch for verbal cues: if they say "that's fine, let's move on," don't keep elaborating
3. Database Schema and Storage Strategy
How you store conversations, messages, and deletion events reveals your understanding of distributed data management.
Hints to consider:
- Conversations table: track participants, creation time, deletion status
- Messages table: link to conversation, support individual message deletion
- Deletion events table: log who deleted what and when, for offline sync
- Consider soft delete vs hard delete: mark conversations as deleted rather than removing immediately
- Indexes for efficient querying: user_id for finding a user's conversations, conversation_id for messages
- Partition strategy: how to distribute data across multiple database nodes
4. Handling Race Conditions and Edge Cases
Interviewers will ask about concurrent operations that conflict with deletion.
Hints to consider:
- User A deletes conversation while User B is sending a message: should the message go through?
- Network partition: User A deletes locally but User B hasn't received the event yet -- what does each see?
- Multiple devices: User deletes on phone, immediately opens laptop -- should it be deleted there too?
- Deletion while actively reading: User is viewing a conversation when it gets deleted -- graceful UI handling?
- Undo capability: should deletion be reversible for some time window?
5. Audit and Compliance Considerations
For production chat systems, true deletion raises compliance questions.
Hints to consider:
- Legal holds: some conversations may need to be retained for legal reasons despite user deletion
- Audit trail: maintain logs of who deleted what and when, even if conversation is removed
- GDPR compliance: user right to deletion vs data retention requirements
- Backup and recovery: deleted conversations in database backups need special handling
Suggested Approach
Step 1: Clarify Requirements and Scope
Ask about latency expectations (real-time vs eventual consistency), scale (number of users, messages per second), offline user behavior (typical offline duration), and whether deletion should be instant or support undo. Confirm whether the interviewer wants focus on deletion mechanics or the broader chat infrastructure.