Practice/Meta/Design a System to Identify and block Weapons on Marketplace
Design a System to Identify and block Weapons on Marketplace
ML System DesignMust
Problem Statement
You're tasked with designing an automated content moderation system for a large-scale online marketplace that processes millions of product listings daily. The system must detect and prevent users from listing prohibited items, with a specific focus on weapons, firearms, explosives, and related accessories. The challenge is to build a system that can accurately identify violations across text descriptions, product titles, images, and user behavior patterns while minimizing false positives that could frustrate legitimate sellers. The system needs to handle real-time moderation for new listings, batch processing for existing inventory, and continuous learning to adapt to evolving evasion tactics.
Your solution should balance precision (avoiding false flags on legitimate items like toy replicas or collectibles) with recall (catching actual policy violations), while maintaining sub-second latency for the user experience and handling peak traffic of 100,000+ listings per minute during major sales events.
Key Requirements
Functional
- Automated Detection -- Identify prohibited weapons and related items across multiple modalities including text, images, and metadata
- Real-time Moderation -- Block policy-violating listings before they go live on the marketplace
- Multi-language Support -- Detect violations in product descriptions across different languages and regional marketplaces
- Appeals Workflow -- Allow legitimate sellers to contest false positives with human review
- Pattern Recognition -- Identify coordinated sellers attempting to circumvent filters through coded language or fragmented listings
Non-Functional
- Scalability -- Handle 100,000+ listing submissions per minute with ability to scale during peak periods
- Reliability -- Maintain 99.9% uptime with graceful degradation if ML models fail
- Latency -- Complete moderation checks within 500ms for real-time listings to avoid user friction
- Consistency -- Ensure uniform policy enforcement across all regions while respecting local regulations
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Multi-Modal ML Pipeline Design
Interviewers want to see how you orchestrate multiple detection mechanisms (text classifiers, image recognition, behavioral signals) into a cohesive system. They'll probe your understanding of when to run models in parallel versus sequentially, and how to combine confidence scores.
Hints to consider:
- Consider using ensemble methods where different models vote on the classification outcome
- Think about early-exit strategies where high-confidence detections skip expensive secondary checks
- Discuss how to handle cases where text says "camping knife" but image shows an assault rifle
- Consider using embedding-based similarity to catch variations of known prohibited terms
2. False Positive Management
Your design must minimize frustration for legitimate sellers while maintaining safety. Interviewers will challenge you on the precision-recall tradeoff and how you'd tune the system over time.
Hints to consider:
- Implement confidence thresholds where borderline cases get human review instead of automatic rejection
- Track seller reputation and listing history to adjust sensitivity per user
- Build an active learning loop where human reviewers' decisions retrain models
- Consider contextual signals like category (sporting goods vs electronics) to adjust thresholds
3. Evasion Tactics and Adversarial Resilience
Bad actors constantly evolve their techniques to bypass filters. Interviewers want to see how you'd build an adaptive system that stays ahead of creative workarounds.
Hints to consider:
- Discuss detecting deliberate misspellings, leetspeak, and character substitution (e.g., "w3ap0n")
- Consider analyzing fragmented listings where seller splits one weapon across multiple innocent-looking posts
- Track cross-listing patterns and user networks to identify coordinated circumvention attempts
- Implement image perturbation detection to catch cases where sellers add noise to fool vision models
4. Real-Time Processing Architecture
The system must make decisions quickly enough not to disrupt the user flow, but thoroughly enough to be effective. Interviewers will probe your understanding of streaming architectures and model serving.
Hints to consider:
- Use lightweight models for initial screening with heavier models triggered only for suspicious cases
- Consider caching model predictions for similar listings or images (perceptual hashing)
- Discuss async vs sync review flows -- what can be flagged post-publication vs must block immediately
- Think about circuit breakers and fallback strategies when ML services are overloaded
5. Compliance and Regional Variations
Weapon policies vary dramatically by country and even by state. Your design must be flexible enough to enforce different rules in different jurisdictions while remaining maintainable.
Hints to consider:
- Build a policy engine separate from detection logic so rules can be updated without model retraining
- Consider that some items are legal in certain regions (airsoft guns in Japan, knives in hunting regions)
- Discuss how to handle cross-border listings and international shipping implications
- Think about audit trails and explainability for regulatory compliance