Practice/Amazon/Design a Risk Assessment Service for Order Purchases

Design a Risk Assessment Service for Order Purchases

System DesignOptional

Problem Statement

Design a risk assessment service that evaluates every e-commerce order in real time to detect fraud, policy violations, and abuse before the order is processed. The service must return a risk decision (approve, reject, or send to manual review) within a tight latency budget, using signals like user history, device fingerprint, payment behavior, and order patterns.

Interviewers ask this to test your ability to design a low-latency decisioning pipeline with high availability, build feature computation from multiple data sources, handle the tradeoff between precision and recall in fraud detection, and integrate a manual review workflow for borderline cases. Expect to discuss feature stores, model serving, rule engines, and how to evolve detection strategies without downtime.

Key Requirements

Functional

Real-time risk scoring -- evaluate each order at checkout and return a risk decision (approve, reject, review) within the latency budget
Feature computation -- compute risk features from user profile, order history, device signals, payment method, and behavioral patterns
Rule engine -- apply configurable business rules (velocity limits, blocklists, geographic restrictions) alongside ML model scores
Manual review queue -- route borderline orders to a human review queue with tools for investigators to approve or reject with annotations

Non-Functional

Scalability -- handle tens of thousands of order evaluations per second during peak shopping events (Prime Day, Black Friday)
Reliability -- maintain 99.99% availability; default to a safe fallback decision (approve with flag) if the service is degraded
Latency -- return risk decisions within 100ms at P99 to avoid blocking the checkout flow
Consistency -- strong consistency for blocklist lookups and velocity counters; eventual consistency for model retraining and feature backfill

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Low-Latency Decisioning Pipeline

The risk service sits in the critical checkout path. Every millisecond of latency directly impacts conversion rates.

Hints to consider:

Execute feature retrieval, rule evaluation, and model scoring in parallel where possible, merging results before the final decision
Cache frequently accessed features (user risk profile, device trust score) in Redis with short TTLs to avoid database round-trips
Use a pre-computed feature store that maintains up-to-date features asynchronously, serving them as key-value lookups at scoring time
Implement circuit breakers on each data source so a slow dependency triggers a timeout and fallback rather than blocking the entire pipeline

2. Feature Engineering and Feature Store

Risk models depend on dozens of features computed from different data sources with varying freshness requirements.

Hints to consider:

Separate features into real-time (computed at request time from the current order), near-real-time (updated from event streams), and batch (recomputed daily)
Use Kafka Streams or Flink to maintain streaming counters like "orders from this device in the last hour" or "total spend by this user today"
Store pre-computed features in a low-latency key-value store (DynamoDB or Redis) keyed by user_id and device_id for fast retrieval
Version features alongside model versions so model retraining uses the same feature definitions that were active during the training period

3. Rule Engine and Model Serving

The system must combine hard business rules with ML model predictions, and both must be updatable without deployments.

Hints to consider:

Evaluate hard rules first (blocklist match, sanctions check) as short-circuit rejections before invoking the ML model
Serve ML models using a dedicated model serving infrastructure (SageMaker, TensorFlow Serving) with A/B testing and shadow scoring
Implement a configurable rule engine where fraud analysts can add or modify rules (velocity thresholds, geographic blocks) through an admin UI
Combine rule outputs and model scores using a decision matrix: if the model score is high confidence, auto-decide; if borderline, apply rules; if ambiguous, route to review

4. Manual Review and Feedback Loop

Borderline cases need human investigation, and review outcomes feed back into model retraining and rule tuning.

Hints to consider:

Route orders to review queues prioritized by risk score and order value, with SLA timers to ensure timely decisions
Provide reviewers with a case view showing all features, signals, model explanations, and similar historical cases
Record review decisions and annotations as labeled training data for model retraining, closing the feedback loop
Implement escalation for high-value or complex cases, with audit trails for compliance and dispute resolution

Suggested Approach

Step 1: Clarify Requirements

Confirm scope with the interviewer. Ask about the volume of orders per second, the latency budget for risk decisions, and what happens when the service is unavailable (fail open or fail closed). Clarify whether ML model serving is in scope or if rule-based evaluation is sufficient. Establish which data sources are available (user profiles, payment history, device fingerprints, third-party signals). Determine whether chargeback and dispute handling are in scope.

Practice/Amazon/Design a Risk Assessment Service for Order Purchases

Design a Risk Assessment Service for Order Purchases

System DesignOptional

Problem Statement

Key Requirements

Functional

Real-time risk scoring -- evaluate each order at checkout and return a risk decision (approve, reject, review) within the latency budget
Feature computation -- compute risk features from user profile, order history, device signals, payment method, and behavioral patterns
Rule engine -- apply configurable business rules (velocity limits, blocklists, geographic restrictions) alongside ML model scores
Manual review queue -- route borderline orders to a human review queue with tools for investigators to approve or reject with annotations

Non-Functional

Scalability -- handle tens of thousands of order evaluations per second during peak shopping events (Prime Day, Black Friday)
Reliability -- maintain 99.99% availability; default to a safe fallback decision (approve with flag) if the service is degraded
Latency -- return risk decisions within 100ms at P99 to avoid blocking the checkout flow
Consistency -- strong consistency for blocklist lookups and velocity counters; eventual consistency for model retraining and feature backfill

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Low-Latency Decisioning Pipeline

The risk service sits in the critical checkout path. Every millisecond of latency directly impacts conversion rates.

Hints to consider:

Execute feature retrieval, rule evaluation, and model scoring in parallel where possible, merging results before the final decision
Cache frequently accessed features (user risk profile, device trust score) in Redis with short TTLs to avoid database round-trips
Use a pre-computed feature store that maintains up-to-date features asynchronously, serving them as key-value lookups at scoring time
Implement circuit breakers on each data source so a slow dependency triggers a timeout and fallback rather than blocking the entire pipeline

2. Feature Engineering and Feature Store

Risk models depend on dozens of features computed from different data sources with varying freshness requirements.

Hints to consider:

Separate features into real-time (computed at request time from the current order), near-real-time (updated from event streams), and batch (recomputed daily)
Use Kafka Streams or Flink to maintain streaming counters like "orders from this device in the last hour" or "total spend by this user today"
Store pre-computed features in a low-latency key-value store (DynamoDB or Redis) keyed by user_id and device_id for fast retrieval
Version features alongside model versions so model retraining uses the same feature definitions that were active during the training period

3. Rule Engine and Model Serving

The system must combine hard business rules with ML model predictions, and both must be updatable without deployments.

Hints to consider:

Evaluate hard rules first (blocklist match, sanctions check) as short-circuit rejections before invoking the ML model
Serve ML models using a dedicated model serving infrastructure (SageMaker, TensorFlow Serving) with A/B testing and shadow scoring
Implement a configurable rule engine where fraud analysts can add or modify rules (velocity thresholds, geographic blocks) through an admin UI
Combine rule outputs and model scores using a decision matrix: if the model score is high confidence, auto-decide; if borderline, apply rules; if ambiguous, route to review

4. Manual Review and Feedback Loop

Borderline cases need human investigation, and review outcomes feed back into model retraining and rule tuning.

Hints to consider:

Route orders to review queues prioritized by risk score and order value, with SLA timers to ensure timely decisions
Provide reviewers with a case view showing all features, signals, model explanations, and similar historical cases
Record review decisions and annotations as labeled training data for model retraining, closing the feedback loop
Implement escalation for high-value or complex cases, with audit trails for compliance and dispute resolution