Practice/Meta/Design Meta Privacy
Design Meta Privacy
System DesignMust
Problem Statement
Design a distributed content moderation system that processes millions of user-generated posts, images, and videos daily. The platform receives content from various sources (posts, comments, profile updates, direct messages) that must be reviewed before becoming visible to other users. The system should route content to appropriate review queues based on automated signals (ML models, heuristics, user reports), assign work to human moderators efficiently, and handle appeals when users contest moderation decisions.
The system must handle high throughput (100K+ items per minute during peak hours), provide low latency for urgent content (terrorist threats, self-harm), ensure fair workload distribution across a global moderator workforce, and maintain audit trails for compliance. A key challenge is balancing automation with human judgment while avoiding moderator burnout from viewing disturbing content repeatedly.
Key Requirements
Functional
- Content ingestion -- Accept content submissions from multiple platform surfaces with metadata (author, type, context) and initial automated scores
- Queue routing -- Assign incoming content to specialized queues based on severity, content type, language, and automated risk scores
- Moderator assignment -- Distribute queued items to available moderators based on their expertise, language skills, shift schedules, and recent workload
- Decision recording -- Capture moderator actions (approve, reject, escalate) with timestamps, reasoning, and confidence levels
- Appeal handling -- Allow users to contest decisions and route appeals to senior reviewers with original context
Non-Functional
- Scalability -- Support 100K content submissions per minute with 10K concurrent moderators across regions
- Reliability -- Ensure no content is lost or double-reviewed; maintain exactly-once processing semantics
- Latency -- Critical content (violence, self-harm) must reach moderators within 30 seconds; routine content within 5 minutes
- Consistency -- Moderators must see the latest version of content and never review already-decided items
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Queue Management and Prioritization
How you design priority queues that balance urgency, fairness, and moderator capacity without starvation or queue buildup.
Hints to consider:
- Separate queues by priority tier (urgent, high, medium, low) with weighted polling to prevent low-priority starvation
- Use time-based SLAs (service level agreements) that escalate priority as items age in queues
- Implement circuit breakers that redirect overflow to generalist queues when specialist queues saturate
- Consider moderator fatigue by rotating disturbing content with benign reviews and enforcing break periods
2. Work Assignment and Load Balancing
How you distribute items to moderators fairly while respecting their skills, shift schedules, and psychological load limits.
Hints to consider:
- Use a reservation system where moderators claim items with TTL leases that auto-release on timeout
- Track moderator velocity (items per hour), accuracy (agreement with peer reviews), and specialization to optimize assignments
- Implement sticky sessions for related content (all comments on one post) to give moderators full context
- Build wellness algorithms that limit exposure to graphic violence or disturbing material per shift
3. Handling Scale and Geographic Distribution
How you architect the system to serve a global workforce reviewing content in 100+ languages across all time zones.
Hints to consider:
- Deploy regional queue clusters to reduce latency and keep moderator data residency compliant
- Use event streaming to broadcast content to multiple regions while avoiding duplicate work through distributed locks
- Partition queues by language and content type to enable regional specialization while allowing overflow routing
- Cache frequently accessed resources (user history, policy documents) close to moderators to minimize lookup latency
4. Data Consistency and Audit Requirements
How you ensure no item is reviewed twice, all decisions are logged immutably, and appeals can reconstruct full historical context.
Hints to consider:
- Implement state machines for content lifecycle (pending, in-review, decided, appealed) with transactional updates
- Store append-only audit logs with content snapshots, moderator identity, decision rationale, and timestamps
- Use versioning to handle concurrent updates (user edits content while moderator reviews original version)
- Build query indices that support compliance requests (all decisions by moderator X, all appeals for policy Y)
Suggested Approach
Step 1: Clarify Requirements
Start by confirming scope and scale parameters. Ask about expected volume (submissions per second, backlog depth), moderator workforce size (shifts, languages, specializations), and SLA targets (how fast must violent content be reviewed versus spam). Clarify whether the system covers all content types (text, images, video) or starts with a subset. Confirm audit and compliance needs (data retention, regulatory requirements). Establish whether ML models exist or if you're designing a pure human-review system. Pin down consistency needs: can a user see their content immediately, or does it remain hidden until approved?