Meta's "Design LeetCode Clone" is a system design interview question focusing on building a scalable online coding platform like LeetCode, emphasizing secure code execution, backend services, and infrastructure for competitions.[1][3]
Functional Requirements
Users can browse coding problems, write and submit code in languages like Python/Java, run it against hidden test cases, and view results with time/space metrics and leaderboards for contests. The core workflow involves code submission, execution in isolation, automated judging, and feedback within seconds.[3][1]
Non-Functional Requirements
Support 100,000+ concurrent users in competitions, ensure 99.9% availability, return results in <5 seconds, and prioritize security via sandboxing to prevent malicious code from harming the system. Scalability via task queues and horizontal scaling is key.[1][3]
Key Components
- Frontend/API Layer: Handles problem listing (GET /problems), submission (POST /submit), and results polling (GET /submission/{id}). Uses DynamoDB/Cassandra for problems/submissions data with pagination.[3]
- Code Execution Engine: Core challenge; uses Docker containers or VMs per submission for isolation, with resource limits (CPU/memory timeouts), network/file restrictions, and language-specific runners (e.g., Python interp with stdin/stdout piping).[1][3]
- Judging Pipeline: Task queue (e.g., SQS/Kafka) dispatches submissions to worker nodes; each worker spins up sandbox, feeds hidden test cases (input/output pairs stored encrypted), compares outputs, measures perf, and reports verdict (AC/WA/TLE/MLE).[1]
- Leaderboard: Real-time sorted set in Redis for contest rankings by score/time, with sharding for scale.[3]
Examples and Constraints
No standardized input/output examples exist as this is a design prompt, not algorithmic, but typical flows include:
- Submission: {problem_id: 1, code: "def add(a,b): return a+b", language: "python"} → Judge runs against tests like input "1 2" expecting "3".
- Constraints: 100K QPS peaks, 1-10 hidden tests/problem, code < 50KB, exec < 2s CPU/256MB RAM per test, support 10+ languages.[3][1]
Deep Dives
Sandboxing dominates: Containers (Docker) for speed vs VMs (Firecracker) for security; ML unused directly but could flag anomalies. Infra scales via auto-scaling workers and queues.[1][3]