System Design - Design Sora Video Generation Scheduling

[ OK ] f904c30b-5a63-414f-841e-45bd8f995a7a — full content available

[ INFO ] category: System Design difficulty: unknown freq: first seen: 2026-03-17

[UNKNOWN][SYSTEM DESIGN]

$ cat problem.md

OpenAI interviews often focus on practical, high-scale system design problems directly related to their core products, such as managing limited GPU resources or high-concurrency model gateways. A "Sora Video Generation Scheduling" problem would likely emphasize resource constraints, multi-tenant fairness, and the unique computational demands of video diffusion models.

Problem Statement: Sora Video Generation Scheduler

Goal: Design a backend orchestration system to manage and schedule text-to-video generation requests for Sora. The system must efficiently allocate limited GPU compute clusters to thousands of concurrent users while ensuring low latency and fairness across different subscription tiers.

1. Functional Requirements

Request Ingestion: Accept user prompts with specific parameters (resolution, duration, aspect ratio).
Job Status Tracking: Provide real-time updates on a job's position in the queue and its generation progress.
Multi-Tier Support: Implement priority levels for different user plans (e.g., Plus vs. Pro), where Pro users get faster access or higher concurrency.
Resource Management: Track available GPU "credits" and ensure jobs only start if the user has sufficient balance.

2. Non-Functional Requirements

High Availability: The scheduler must be resilient to worker node (GPU) crashes and handle job retries automatically.
Scalability: Support a sudden spike in requests (e.g., a viral trend) without cascading failures.
Fairness: Prevent "noisy neighbors" from monopolizing the GPU cluster, ensuring all users in a tier get a fair share of compute time.
Cost Efficiency: Optimize job packing and GPU sharing to reduce wasted idle time on expensive H100/A100 clusters.

3. Core Design Challenges

Long-Running Tasks: Unlike text tokens, video generation can take several minutes. How do you manage long-lived connections or notify users of completion?
Resource Heterogeneity: Some videos (1080p, 60s) require significantly more VRAM and compute than others (480p, 10s). How does your scheduler balance these diverse loads?
Preemption & Priority: If a high-priority job arrives, can you safely pause or "preempt" a low-priority job that has already consumed 30 seconds of GPU time?.

Sample Interview "Deep Dive" Questions

"How would you design a GPU Credit Calculator class that handles credits with expiration times while jobs are running?".
"What data structure would you use to maintain a priority queue that also ensures fairness across 10,000 active users?".
"If a GPU worker fails midway through a 60-second video generation, how do you ensure the user isn't overcharged while also minimizing wasted compute?".

Would you like to explore a coding implementation for the GPU credit system or a system architecture diagram for the scheduler?

[0] - Open AI Coding Interview | Design AI Gateway [1] - OpenAI Software Engineer Interview Process [2] - OpenAI SWE Coding Interview: GPU Credit Calculator ... [3] - Design ChatGPT - Top Interview Question in OpenAI & Anthropic [4] - Sora AI Tutorial — How to Create Stunning AI Videos [5] - OpenAI's Sora Made Me Crazy AI Videos—Then the CTO ... [6] - Design a CI/CD System - Top Interview Question in OpenAI ... [7] - OpenAI Sora Crash Course for Beginners [8] - Sora Developers Explain AI Video Generation [9] - Using Machine Learning to Improve Job Scheduling in ... [10] - “Inside the GPU Scheduler: My Journey Into Building Low ... [11] - A Stochastic Approach for Scheduling AI Training Jobs in GPU- ... [12] - 12 GPU Scheduling Patterns for Multi-Model Serving | by Modexa

user@intervues:~/openai$