Practice/Oracle/Design an AI tutoring platform with RAG based Q&A for assignments and coursework
Design an AI tutoring platform with RAG based Q&A for assignments and coursework
System DesignOptional
Problem Statement
Design an AI-powered tutoring platform that uses Retrieval-Augmented Generation (RAG) to help students get answers to questions about their assignments and coursework. The system should retrieve relevant educational content and generate contextual responses to student queries with citations pointing to specific course materials.
The platform needs to support thousands of concurrent users across multiple institutions, with peak traffic occurring around assignment deadlines and exam periods. Each course maintains its own content repository, and strict access control ensures students can only query materials from courses they are enrolled in. Responses must be fast enough to support interactive conversation (under 3 seconds), and the system must prevent AI hallucinations by grounding all answers in actual course content.
Key Requirements
Functional
- Question answering -- students submit natural-language questions and receive contextually relevant answers with citations pointing to specific pages or sections in course materials
- Content ingestion -- instructors upload diverse file types (PDFs, slides, documents, videos with transcripts) which are processed, indexed, and made searchable
- Access control -- course enrollment determines content visibility; students see only their enrolled courses, while instructors manage multiple courses
- Conversation continuity -- chat sessions persist across multiple questions, maintaining context so follow-up queries reference previous exchanges
Non-Functional
- Scalability -- support 100,000+ students across 1,000+ courses with 50,000 concurrent queries during peak exam periods
- Reliability -- maintain 99.9% uptime for query functionality; content ingestion can be eventually consistent with progress tracking
- Latency -- return query responses within 3 seconds at p95, including retrieval, ranking, and generation steps
- Consistency -- strong consistency for enrollment and permissions; eventual consistency for content indexing with clear processing states
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. RAG Pipeline Architecture
This is the heart of the system. Interviewers want to see how you chunk documents, create embeddings, store vectors, and orchestrate the retrieve-then-generate workflow without hallucinating answers.
Hints to consider:
- Chunk documents into overlapping segments (e.g., 500 tokens with 50-token overlap) to preserve context across boundaries
- Use hybrid search combining vector similarity (semantic) with keyword matching (BM25) to catch both conceptual and exact matches
- Implement a reranking step after initial retrieval to improve top-k precision before sending context to the language model
- Design prompts that force the model to cite source chunks and refuse to answer when materials do not contain relevant information
2. Content Ingestion Pipeline
Raw course materials arrive in messy formats (scanned PDFs, complex slide layouts, embedded images). Interviewers expect a robust background processing system.
Hints to consider:
- Build an asynchronous job queue with separate workers for text extraction, OCR, chunking, and embedding generation
- Store ingestion jobs with status tracking (pending, processing, completed, failed) and exponential backoff for retries
- Version document chunks and embeddings so updates do not break in-flight queries
- Apply different processing strategies per content type: extract speaker notes from slides, use OCR for scanned materials
3. Multi-Tenancy and Access Control
A single mistake in tenant isolation exposes student data across courses. Strong designs enforce permissions at every layer.
Hints to consider:
- Embed course ID and role metadata directly in search index documents, then filter retrieval queries to enforce enrollment-based access
- Maintain an enrollment service that caches course memberships in Redis to avoid database round-trips on every query
- Use row-level security or database views to prevent accidental cross-course leaks in metadata queries
- Consider institutional hierarchy for multi-institution deployments with isolated indexes per tenant
4. Query Performance and Cost Optimization
Language model API calls are expensive and slow. Interviewers want to see caching strategies and techniques to reduce latency and cost.
Hints to consider:
- Cache embeddings for common questions keyed by course_id and question_text to skip re-embedding
- Implement semantic caching that detects paraphrased questions and returns cached responses for similar queries
- Use tiered LLM strategies: fast/cheap models for simple factual lookups, larger models only for complex reasoning
- Apply rate limiting per user and per course to prevent abuse and control API costs during traffic spikes