Practice/Amazon/Design a Risk Assessment Service for Order Purchases
Design a Risk Assessment Service for Order Purchases
System DesignOptional
Problem Statement
Design a risk assessment service that evaluates every e-commerce order in real time to detect fraud, policy violations, and abuse before the order is processed. The service must return a risk decision (approve, reject, or send to manual review) within a tight latency budget, using signals like user history, device fingerprint, payment behavior, and order patterns.
Interviewers ask this to test your ability to design a low-latency decisioning pipeline with high availability, build feature computation from multiple data sources, handle the tradeoff between precision and recall in fraud detection, and integrate a manual review workflow for borderline cases. Expect to discuss feature stores, model serving, rule engines, and how to evolve detection strategies without downtime.
Key Requirements
Functional
- Real-time risk scoring -- evaluate each order at checkout and return a risk decision (approve, reject, review) within the latency budget
- Feature computation -- compute risk features from user profile, order history, device signals, payment method, and behavioral patterns
- Rule engine -- apply configurable business rules (velocity limits, blocklists, geographic restrictions) alongside ML model scores
- Manual review queue -- route borderline orders to a human review queue with tools for investigators to approve or reject with annotations
Non-Functional
- Scalability -- handle tens of thousands of order evaluations per second during peak shopping events (Prime Day, Black Friday)
- Reliability -- maintain 99.99% availability; default to a safe fallback decision (approve with flag) if the service is degraded
- Latency -- return risk decisions within 100ms at P99 to avoid blocking the checkout flow
- Consistency -- strong consistency for blocklist lookups and velocity counters; eventual consistency for model retraining and feature backfill
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Low-Latency Decisioning Pipeline
The risk service sits in the critical checkout path. Every millisecond of latency directly impacts conversion rates.
Hints to consider:
- Execute feature retrieval, rule evaluation, and model scoring in parallel where possible, merging results before the final decision
- Cache frequently accessed features (user risk profile, device trust score) in Redis with short TTLs to avoid database round-trips
- Use a pre-computed feature store that maintains up-to-date features asynchronously, serving them as key-value lookups at scoring time
- Implement circuit breakers on each data source so a slow dependency triggers a timeout and fallback rather than blocking the entire pipeline
2. Feature Engineering and Feature Store
Risk models depend on dozens of features computed from different data sources with varying freshness requirements.
Hints to consider:
- Separate features into real-time (computed at request time from the current order), near-real-time (updated from event streams), and batch (recomputed daily)
- Use Kafka Streams or Flink to maintain streaming counters like "orders from this device in the last hour" or "total spend by this user today"
- Store pre-computed features in a low-latency key-value store (DynamoDB or Redis) keyed by user_id and device_id for fast retrieval
- Version features alongside model versions so model retraining uses the same feature definitions that were active during the training period
3. Rule Engine and Model Serving
The system must combine hard business rules with ML model predictions, and both must be updatable without deployments.
Hints to consider:
- Evaluate hard rules first (blocklist match, sanctions check) as short-circuit rejections before invoking the ML model
- Serve ML models using a dedicated model serving infrastructure (SageMaker, TensorFlow Serving) with A/B testing and shadow scoring
- Implement a configurable rule engine where fraud analysts can add or modify rules (velocity thresholds, geographic blocks) through an admin UI
- Combine rule outputs and model scores using a decision matrix: if the model score is high confidence, auto-decide; if borderline, apply rules; if ambiguous, route to review
4. Manual Review and Feedback Loop
Borderline cases need human investigation, and review outcomes feed back into model retraining and rule tuning.
Hints to consider:
- Route orders to review queues prioritized by risk score and order value, with SLA timers to ensure timely decisions
- Provide reviewers with a case view showing all features, signals, model explanations, and similar historical cases
- Record review decisions and annotations as labeled training data for model retraining, closing the feedback loop
- Implement escalation for high-value or complex cases, with audit trails for compliance and dispute resolution
Suggested Approach
Step 1: Clarify Requirements
Confirm scope with the interviewer. Ask about the volume of orders per second, the latency budget for risk decisions, and what happens when the service is unavailable (fail open or fail closed). Clarify whether ML model serving is in scope or if rule-based evaluation is sufficient. Establish which data sources are available (user profiles, payment history, device fingerprints, third-party signals). Determine whether chargeback and dispute handling are in scope.