Design LeetCode Clone

[ OK ] 80 — full content available

[ INFO ] category: Coding · System Design difficulty: hard freq: medium first seen: 2026-01-12

[HARD][SYSTEM DESIGN][MEDIUM]Code Executionwebbackendinfrastructuremachine_learningSandboxingSystem Design

$ cat problem.md

Meta's "Design LeetCode Clone" is a system design interview question focusing on building a scalable online coding platform like LeetCode, emphasizing secure code execution, backend services, and infrastructure for competitions.[1][3]

Functional Requirements

Users can browse coding problems, write and submit code in languages like Python/Java, run it against hidden test cases, and view results with time/space metrics and leaderboards for contests. The core workflow involves code submission, execution in isolation, automated judging, and feedback within seconds.[3][1]

Non-Functional Requirements

Support 100,000+ concurrent users in competitions, ensure 99.9% availability, return results in <5 seconds, and prioritize security via sandboxing to prevent malicious code from harming the system. Scalability via task queues and horizontal scaling is key.[1][3]

Key Components

Frontend/API Layer: Handles problem listing (GET /problems), submission (POST /submit), and results polling (GET /submission/{id}). Uses DynamoDB/Cassandra for problems/submissions data with pagination.[3]
Code Execution Engine: Core challenge; uses Docker containers or VMs per submission for isolation, with resource limits (CPU/memory timeouts), network/file restrictions, and language-specific runners (e.g., Python interp with stdin/stdout piping).[1][3]
Judging Pipeline: Task queue (e.g., SQS/Kafka) dispatches submissions to worker nodes; each worker spins up sandbox, feeds hidden test cases (input/output pairs stored encrypted), compares outputs, measures perf, and reports verdict (AC/WA/TLE/MLE).[1]
Leaderboard: Real-time sorted set in Redis for contest rankings by score/time, with sharding for scale.[3]

Examples and Constraints

No standardized input/output examples exist as this is a design prompt, not algorithmic, but typical flows include:

Submission: {problem_id: 1, code: "def add(a,b): return a+b", language: "python"} → Judge runs against tests like input "1 2" expecting "3".
Constraints: 100K QPS peaks, 1-10 hidden tests/problem, code < 50KB, exec < 2s CPU/256MB RAM per test, support 10+ languages.[3][1]

Deep Dives

Sandboxing dominates: Containers (Docker) for speed vs VMs (Firecracker) for security; ML unused directly but could flag anomalies. Infra scales via auto-scaling workers and queues.[1][3]

user@intervues:~/meta$