Practice/TikTok/Design a Video Moderation System
Design a Video Moderation System
System DesignMust
Problem Statement
Design a video moderation system for a global short-form video platform like TikTok that ingests millions of daily uploads, evaluates each video for policy violations using machine learning models, applies configurable business rules to determine enforcement actions, and maintains a complete audit trail for compliance and appeals. The system must deliver moderation decisions within seconds of upload to prevent harmful content from reaching viewers.
The moderation pipeline spans several stages: video ingestion and media storage, feature extraction (keyframes, audio transcripts, metadata), ML inference across multiple specialized models (nudity detection, violence classification, hate symbol recognition, copyright fingerprinting), rule evaluation against the latest policy configuration, automated enforcement (remove, restrict, age-gate, escalate to human review), and audit logging. Real TikTok interview reports describe a scenario where business rules combine model scores -- for example, if Model A score exceeds 0.9 and Model B score exceeds 0.95, the video is flagged as a violation. Operations teams need the ability to update these rules without redeploying services.
Interviewers use this question to assess your ability to decompose a multi-stage asynchronous pipeline, manage ML model versioning and inference at scale, design a flexible rule engine, ensure idempotent enforcement actions under at-least-once delivery, and maintain explainability and auditability across every decision.
Key Requirements
Functional
- Video ingestion and processing -- accept video uploads, store raw media in blob storage, extract keyframes and audio features, and route content through the moderation pipeline
- ML-based risk scoring -- run each video through multiple specialized ML models that produce confidence scores for different violation categories, tagged with model version identifiers
- Configurable rule evaluation -- evaluate ML scores and video metadata against versioned business rules maintained by operations teams, producing an enforcement decision without requiring code deployments
- Automated enforcement -- execute the determined action (remove, geo-block, age-restrict, add warning label, escalate to human queue) automatically and notify the content creator
- Audit trail -- record every moderation decision with the full context: model versions, confidence scores, rules fired, policy version, action taken, and timestamp, accessible for appeals and regulatory review
Non-Functional
- Scalability -- handle 10 million or more video uploads per day with 5x peak traffic bursts during viral events, scaling each pipeline stage independently
- Latency -- deliver moderation decisions for 95 percent of videos within 10 seconds of upload to minimize the window for harmful content exposure
- Reliability -- guarantee that no uploaded video bypasses moderation; tolerate component failures without losing videos or producing duplicate enforcement actions
- Consistency -- ensure idempotent enforcement so that at-least-once message delivery does not cause double-removal or conflicting actions on the same video
What Interviewers Focus On
Based on real interview experiences at TikTok, these are the areas interviewers probe most deeply:
1. Multi-Stage Pipeline Decomposition
Interviewers expect you to break moderation into clearly defined stages with independent scaling, failure isolation, and observable contracts between them. They want to see event-driven decoupling rather than synchronous request-response chains.
Hints to consider:
- Use Kafka topics between each stage (ingest, extract, infer, evaluate, enforce) so each consumer group scales and retries independently
- Pass video references (S3 URIs and metadata) through the pipeline rather than copying large binary payloads between services
- Design for backpressure: if ML inference is slow, upstream queues buffer without dropping work or overwhelming the models
- Consider orchestration (a saga coordinator) versus choreography (event-driven) and discuss tradeoffs for retry logic and compensation (like reverting an action on appeal)
2. ML Model Integration and Versioning
The system invokes multiple models per video, and models are updated frequently as threats evolve. Interviewers probe how you manage model deployment, version tracking, and inference efficiency.
Hints to consider:
- Separate the model serving infrastructure (GPU clusters behind a gRPC inference API) from the moderation pipeline logic so models can be deployed and rolled back independently
- Tag every inference result with the model ID and version so decisions are reproducible months later during audits or appeals
- Batch inference requests (multiple keyframes from the same video or frames from multiple videos) to maximize GPU utilization
- Cache inference results keyed by content hash to skip redundant processing when the same video is re-uploaded or a near-duplicate is detected
3. Rule Engine Flexibility and Governance
Business rules change frequently as regulations evolve and new threats emerge. Interviewers want to see a design where policy changes are fast, safe, and fully traceable.
Hints to consider:
- Store rules as versioned data objects (JSON or a lightweight DSL) in a configuration service like etcd or a database with version history
- The rule evaluation service loads the active policy version into memory and subscribes to change events, reloading atomically on updates
- Support canary rollouts for new rules: evaluate new policies on a percentage of traffic and compare outcomes before full deployment
- Record the policy version ID with every evaluation result so appeals can replay the exact logic that was in effect when the decision was made
4. Idempotent Enforcement and Failure Recovery
With at-least-once delivery semantics, the same video may be processed multiple times. Enforcement actions (blocking a video, notifying users) must not produce inconsistent or duplicated side effects.
Hints to consider:
- Assign a unique, stable moderation request ID at ingestion and use it as an idempotency key for all downstream operations
- Store enforcement state (action taken, timestamp, request ID) in a database and check this record before executing any action
- Design enforcement operations to be naturally idempotent: "set video status to BLOCKED" rather than "increment block count"
- Use a transactional outbox pattern to atomically update enforcement state and publish notification events, preventing scenarios where the action succeeds but the event is lost
Suggested Approach
Step 1: Clarify Requirements
Begin by confirming scope and scale with the interviewer. Ask about daily upload volume, average video duration and size, and peak traffic patterns. Clarify how many ML models need to run per video and what their typical inference latency is. Confirm what enforcement actions are available and whether human review is in scope. Ask about rule complexity: are rules simple threshold comparisons, or do they involve combinations of scores, creator reputation, geographic context, and temporal patterns? Establish latency targets for end-to-end moderation and ask about regulatory requirements for audit retention and data residency.