Design a Chatbot with Multiple Threads — OpenAI

Problem Statement

Design a mobile-first conversational AI platform that allows users to maintain and switch between multiple independent conversation threads with an AI assistant. Users should be able to start new conversations on different topics, resume previous discussions seamlessly, and organize their chat history effectively. The system needs to handle the complexity of context management across threads while maintaining fast response times and a smooth user experience.

The platform is expected to serve 50 million daily active users, with each user maintaining an average of 10-15 active conversation threads. Peak load occurs during business hours across different time zones, with 200,000 concurrent users engaging in conversations. The system must handle both short-form exchanges and long-running conversations that span multiple days or weeks.

Key Requirements

Functional

Thread Management -- Users can create, delete, rename, and archive conversation threads independently
Context Preservation -- Each thread maintains its own conversation history and context that persists across sessions
Real-time Messaging -- Messages are delivered and displayed with minimal latency as users type and receive responses
Thread Switching -- Users can seamlessly switch between threads without losing context or experiencing delays
Search and Discovery -- Users can search within and across threads to find past conversations and messages
Synchronization -- Conversation threads sync across multiple devices in real-time

Non-Functional

Scalability -- Support 50M DAU with 200K concurrent users; handle 10-15 threads per user on average
Reliability -- 99.95% uptime; no message loss; graceful degradation during AI service outages
Latency -- Sub-200ms thread switching; AI responses start streaming within 1 second; message sync under 500ms
Consistency -- Strong consistency for message ordering within threads; eventual consistency across devices acceptable with conflict resolution

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Thread State Management and Context Isolation

Understanding how to maintain separate conversation contexts while enabling efficient switching is critical. Poor context management leads to confusion, slow switching, and excessive memory usage.

Hints to consider:

Consider storing thread metadata separately from full message history for fast loading
Think about lazy loading strategies for thread content versus eager loading for metadata
Evaluate tradeoffs between client-side caching and server-side context reconstruction
Design how to handle context windows for AI models that have token limits

2. Data Modeling and Storage Strategy

The choice of data model significantly impacts query performance, especially when users have hundreds of threads with thousands of messages each.

Hints to consider:

Compare document-oriented storage versus relational models for thread and message data
Consider partition strategies based on user ID and thread ID for horizontal scaling
Think about hot/cold data separation: recent active threads versus archived conversations
Design indexes to optimize common queries like "recent threads" and "search within thread"

3. Mobile-First Architecture and Offline Support

Mobile devices have unique constraints including intermittent connectivity, limited storage, and battery considerations that desktop systems don't face.

Hints to consider:

Design a local cache strategy that balances storage limits with user experience
Consider how to handle message queueing when the device goes offline mid-conversation
Think about differential sync protocols to minimize data transfer on mobile networks
Evaluate tradeoffs between WebSocket persistence and adaptive polling for different network conditions

4. Real-time Synchronization Across Devices

Users expect their conversations to appear instantly across all their devices, which requires careful coordination without overwhelming the system.

Hints to consider:

Compare push-based notifications versus pull-based polling for different update scenarios
Design conflict resolution for edge cases where users message from multiple devices simultaneously
Consider how to handle devices that come online after being offline for extended periods
Think about presence indicators and typing notifications across the multi-device setup

5. AI Integration and Streaming Response Handling

Integrating with AI models that generate responses token-by-token requires special handling to maintain low perceived latency.

Hints to consider:

Design the interface between your service and the AI model to handle streaming responses
Consider rate limiting and queueing strategies during traffic spikes to avoid overwhelming the AI service
Think about fallback mechanisms when the AI service is slow or unavailable
Evaluate caching strategies for common queries while maintaining conversation context

Suggested Approach

Step 1: Clarify Requirements

Start by confirming scale assumptions and defining the scope clearly. Ask about typical user behavior: How long do threads remain active? What's the distribution of thread count per user? Are there power users with hundreds of threads? Clarify whether the AI responses are generated by an internal model or external API. Confirm mobile platform requirements (iOS, Android, web) and whether offline support is critical. Understand privacy and data retention policies.

Step 2: High-Level Architecture

Sketch the major components: Mobile clients with local storage, API gateway for request routing, Thread Management Service for CRUD operations, Message Service for real-time delivery, AI Service wrapper for handling model interactions, and a Storage Layer with both databases and caching. Show how these components interact during common flows like creating a thread, sending a message, and syncing across devices. Include CDN for static assets and consider separate read/write paths.

Step 3: Deep Dive on Key Area

Focus on the thread state management and context handling. Walk through the data structures stored on the client versus server. Explain how thread metadata (ID, title, last message, timestamp) is kept in fast storage while full message history is loaded on demand. Discuss the synchronization protocol: when a user switches threads, the client first loads from local cache if available, then fetches updates from server. Detail how the AI context is reconstructed from message history with truncation strategies for long threads. Address how real-time updates are pushed to all active devices.

Step 4: Address Secondary Concerns

Discuss reliability through message delivery guarantees using idempotency keys and acknowledgment protocols. Cover scalability by explaining database sharding strategies, read replicas for query distribution, and CDN usage. Address monitoring through metrics on thread switching latency, message delivery rates, and AI response times. Mention security considerations like end-to-end encryption options and access control. Touch on potential optimizations like message compression, predictive thread preloading based on usage patterns, and caching frequent AI responses.

Problem Statement

Key Requirements

Functional

Thread Management -- Users can create, delete, rename, and archive conversation threads independently
Context Preservation -- Each thread maintains its own conversation history and context that persists across sessions
Real-time Messaging -- Messages are delivered and displayed with minimal latency as users type and receive responses
Thread Switching -- Users can seamlessly switch between threads without losing context or experiencing delays
Search and Discovery -- Users can search within and across threads to find past conversations and messages
Synchronization -- Conversation threads sync across multiple devices in real-time

Non-Functional

Scalability -- Support 50M DAU with 200K concurrent users; handle 10-15 threads per user on average
Reliability -- 99.95% uptime; no message loss; graceful degradation during AI service outages
Latency -- Sub-200ms thread switching; AI responses start streaming within 1 second; message sync under 500ms
Consistency -- Strong consistency for message ordering within threads; eventual consistency across devices acceptable with conflict resolution

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Thread State Management and Context Isolation

Understanding how to maintain separate conversation contexts while enabling efficient switching is critical. Poor context management leads to confusion, slow switching, and excessive memory usage.

Hints to consider:

Consider storing thread metadata separately from full message history for fast loading
Think about lazy loading strategies for thread content versus eager loading for metadata
Evaluate tradeoffs between client-side caching and server-side context reconstruction
Design how to handle context windows for AI models that have token limits

2. Data Modeling and Storage Strategy

The choice of data model significantly impacts query performance, especially when users have hundreds of threads with thousands of messages each.

Hints to consider:

Compare document-oriented storage versus relational models for thread and message data
Consider partition strategies based on user ID and thread ID for horizontal scaling
Think about hot/cold data separation: recent active threads versus archived conversations
Design indexes to optimize common queries like "recent threads" and "search within thread"

3. Mobile-First Architecture and Offline Support

Mobile devices have unique constraints including intermittent connectivity, limited storage, and battery considerations that desktop systems don't face.

Hints to consider:

Design a local cache strategy that balances storage limits with user experience
Consider how to handle message queueing when the device goes offline mid-conversation
Think about differential sync protocols to minimize data transfer on mobile networks
Evaluate tradeoffs between WebSocket persistence and adaptive polling for different network conditions

4. Real-time Synchronization Across Devices

Users expect their conversations to appear instantly across all their devices, which requires careful coordination without overwhelming the system.

Hints to consider:

Compare push-based notifications versus pull-based polling for different update scenarios
Design conflict resolution for edge cases where users message from multiple devices simultaneously
Consider how to handle devices that come online after being offline for extended periods
Think about presence indicators and typing notifications across the multi-device setup

5. AI Integration and Streaming Response Handling

Integrating with AI models that generate responses token-by-token requires special handling to maintain low perceived latency.

Hints to consider:

Design the interface between your service and the AI model to handle streaming responses
Consider rate limiting and queueing strategies during traffic spikes to avoid overwhelming the AI service
Think about fallback mechanisms when the AI service is slow or unavailable
Evaluate caching strategies for common queries while maintaining conversation context