System Design - Design Sora Video Generation Scheduling
[ OK ]f904c30b-5a63-414f-841e-45bd8f995a7a — full content available
[ INFO ]category: System Design difficulty: unknown freq: first seen: 2026-03-17
[UNKNOWN][SYSTEM DESIGN]New
$catproblem.md
OpenAI interviews often focus on practical, high-scale system design problems directly related to their core products, such as managing limited GPU resources or high-concurrency model gateways. A "Sora Video Generation Scheduling" problem would likely emphasize resource constraints, multi-tenant fairness, and the unique computational demands of video diffusion models.
Problem Statement: Sora Video Generation Scheduler
Goal: Design a backend orchestration system to manage and schedule text-to-video generation requests for Sora. The system must efficiently allocate limited GPU compute clusters to thousands of concurrent users while ensuring low latency and fairness across different subscription tiers.
1. Functional Requirements
Request Ingestion: Accept user prompts with specific parameters (resolution, duration, aspect ratio).
Job Status Tracking: Provide real-time updates on a job's position in the queue and its generation progress.
Multi-Tier Support: Implement priority levels for different user plans (e.g., Plus vs. Pro), where Pro users get faster access or higher concurrency.
Resource Management: Track available GPU "credits" and ensure jobs only start if the user has sufficient balance.
2. Non-Functional Requirements
High Availability: The scheduler must be resilient to worker node (GPU) crashes and handle job retries automatically.
Scalability: Support a sudden spike in requests (e.g., a viral trend) without cascading failures.
Fairness: Prevent "noisy neighbors" from monopolizing the GPU cluster, ensuring all users in a tier get a fair share of compute time.
Cost Efficiency: Optimize job packing and GPU sharing to reduce wasted idle time on expensive H100/A100 clusters.
3. Core Design Challenges
Long-Running Tasks: Unlike text tokens, video generation can take several minutes. How do you manage long-lived connections or notify users of completion?
Resource Heterogeneity: Some videos (1080p, 60s) require significantly more VRAM and compute than others (480p, 10s). How does your scheduler balance these diverse loads?
Preemption & Priority: If a high-priority job arrives, can you safely pause or "preempt" a low-priority job that has already consumed 30 seconds of GPU time?.
Sample Interview "Deep Dive" Questions
"How would you design a GPU Credit Calculator class that handles credits with expiration times while jobs are running?".
"What data structure would you use to maintain a priority queue that also ensures fairness across 10,000 active users?".
"If a GPU worker fails midway through a 60-second video generation, how do you ensure the user isn't overcharged while also minimizing wasted compute?".
Would you like to explore a coding implementation for the GPU credit system or a system architecture diagram for the scheduler?