Practice/Oracle/Design an AI tutoring platform with RAG based Q&A for assignments and coursework

Design an AI tutoring platform with RAG based Q&A for assignments and coursework

System DesignOptional

Problem Statement

Design an AI-powered tutoring platform that uses Retrieval-Augmented Generation (RAG) to help students get answers to questions about their assignments and coursework. The system should retrieve relevant educational content and generate contextual responses to student queries with citations pointing to specific course materials.

The platform needs to support thousands of concurrent users across multiple institutions, with peak traffic occurring around assignment deadlines and exam periods. Each course maintains its own content repository, and strict access control ensures students can only query materials from courses they are enrolled in. Responses must be fast enough to support interactive conversation (under 3 seconds), and the system must prevent AI hallucinations by grounding all answers in actual course content.

Key Requirements

Functional

Question answering -- students submit natural-language questions and receive contextually relevant answers with citations pointing to specific pages or sections in course materials
Content ingestion -- instructors upload diverse file types (PDFs, slides, documents, videos with transcripts) which are processed, indexed, and made searchable
Access control -- course enrollment determines content visibility; students see only their enrolled courses, while instructors manage multiple courses
Conversation continuity -- chat sessions persist across multiple questions, maintaining context so follow-up queries reference previous exchanges

Non-Functional

Scalability -- support 100,000+ students across 1,000+ courses with 50,000 concurrent queries during peak exam periods
Reliability -- maintain 99.9% uptime for query functionality; content ingestion can be eventually consistent with progress tracking
Latency -- return query responses within 3 seconds at p95, including retrieval, ranking, and generation steps
Consistency -- strong consistency for enrollment and permissions; eventual consistency for content indexing with clear processing states

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. RAG Pipeline Architecture

This is the heart of the system. Interviewers want to see how you chunk documents, create embeddings, store vectors, and orchestrate the retrieve-then-generate workflow without hallucinating answers.

Hints to consider:

Chunk documents into overlapping segments (e.g., 500 tokens with 50-token overlap) to preserve context across boundaries
Use hybrid search combining vector similarity (semantic) with keyword matching (BM25) to catch both conceptual and exact matches
Implement a reranking step after initial retrieval to improve top-k precision before sending context to the language model
Design prompts that force the model to cite source chunks and refuse to answer when materials do not contain relevant information

2. Content Ingestion Pipeline

Raw course materials arrive in messy formats (scanned PDFs, complex slide layouts, embedded images). Interviewers expect a robust background processing system.

Hints to consider:

Build an asynchronous job queue with separate workers for text extraction, OCR, chunking, and embedding generation
Store ingestion jobs with status tracking (pending, processing, completed, failed) and exponential backoff for retries
Version document chunks and embeddings so updates do not break in-flight queries
Apply different processing strategies per content type: extract speaker notes from slides, use OCR for scanned materials

3. Multi-Tenancy and Access Control

A single mistake in tenant isolation exposes student data across courses. Strong designs enforce permissions at every layer.

Hints to consider:

Embed course ID and role metadata directly in search index documents, then filter retrieval queries to enforce enrollment-based access
Maintain an enrollment service that caches course memberships in Redis to avoid database round-trips on every query
Use row-level security or database views to prevent accidental cross-course leaks in metadata queries
Consider institutional hierarchy for multi-institution deployments with isolated indexes per tenant

4. Query Performance and Cost Optimization

Language model API calls are expensive and slow. Interviewers want to see caching strategies and techniques to reduce latency and cost.

Hints to consider:

Cache embeddings for common questions keyed by course_id and question_text to skip re-embedding
Implement semantic caching that detects paraphrased questions and returns cached responses for similar queries
Use tiered LLM strategies: fast/cheap models for simple factual lookups, larger models only for complex reasoning
Apply rate limiting per user and per course to prevent abuse and control API costs during traffic spikes

Practice/Oracle/Design an AI tutoring platform with RAG based Q&A for assignments and coursework

Design an AI tutoring platform with RAG based Q&A for assignments and coursework

System DesignOptional

Problem Statement

Key Requirements

Functional

Question answering -- students submit natural-language questions and receive contextually relevant answers with citations pointing to specific pages or sections in course materials
Content ingestion -- instructors upload diverse file types (PDFs, slides, documents, videos with transcripts) which are processed, indexed, and made searchable
Access control -- course enrollment determines content visibility; students see only their enrolled courses, while instructors manage multiple courses
Conversation continuity -- chat sessions persist across multiple questions, maintaining context so follow-up queries reference previous exchanges

Non-Functional

Scalability -- support 100,000+ students across 1,000+ courses with 50,000 concurrent queries during peak exam periods
Reliability -- maintain 99.9% uptime for query functionality; content ingestion can be eventually consistent with progress tracking
Latency -- return query responses within 3 seconds at p95, including retrieval, ranking, and generation steps
Consistency -- strong consistency for enrollment and permissions; eventual consistency for content indexing with clear processing states

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. RAG Pipeline Architecture

This is the heart of the system. Interviewers want to see how you chunk documents, create embeddings, store vectors, and orchestrate the retrieve-then-generate workflow without hallucinating answers.

Hints to consider:

Chunk documents into overlapping segments (e.g., 500 tokens with 50-token overlap) to preserve context across boundaries
Use hybrid search combining vector similarity (semantic) with keyword matching (BM25) to catch both conceptual and exact matches
Implement a reranking step after initial retrieval to improve top-k precision before sending context to the language model
Design prompts that force the model to cite source chunks and refuse to answer when materials do not contain relevant information

2. Content Ingestion Pipeline

Raw course materials arrive in messy formats (scanned PDFs, complex slide layouts, embedded images). Interviewers expect a robust background processing system.

Hints to consider:

Build an asynchronous job queue with separate workers for text extraction, OCR, chunking, and embedding generation
Store ingestion jobs with status tracking (pending, processing, completed, failed) and exponential backoff for retries
Version document chunks and embeddings so updates do not break in-flight queries
Apply different processing strategies per content type: extract speaker notes from slides, use OCR for scanned materials

3. Multi-Tenancy and Access Control

A single mistake in tenant isolation exposes student data across courses. Strong designs enforce permissions at every layer.

Hints to consider:

Embed course ID and role metadata directly in search index documents, then filter retrieval queries to enforce enrollment-based access
Maintain an enrollment service that caches course memberships in Redis to avoid database round-trips on every query
Use row-level security or database views to prevent accidental cross-course leaks in metadata queries
Consider institutional hierarchy for multi-institution deployments with isolated indexes per tenant

4. Query Performance and Cost Optimization

Language model API calls are expensive and slow. Interviewers want to see caching strategies and techniques to reduce latency and cost.

Hints to consider:

Cache embeddings for common questions keyed by course_id and question_text to skip re-embedding
Implement semantic caching that detects paraphrased questions and returns cached responses for similar queries
Use tiered LLM strategies: fast/cheap models for simple factual lookups, larger models only for complex reasoning
Apply rate limiting per user and per course to prevent abuse and control API costs during traffic spikes