Practice/Stripe/Account Takeover Prediction System

Account Takeover Prediction System

System DesignMust

Problem Statement

Design a machine learning system to predict account takeover (ATO) risk for a payments API platform. Account takeover occurs when bad actors gain unauthorized access to legitimate user accounts through stolen credentials, session hijacking, or identity fraud.

The problem is intentionally open-ended about the specific use case. Before diving into the solution, you should clarify with the interviewer whether the focus is on login-time detection, session-level anomaly detection, or transaction-level risk scoring -- each leads to a different system design. The system must score risk in real time while balancing security (catching compromised accounts) against user experience (avoiding false lockouts of legitimate users).

Key Requirements

Functional

Real-time risk scoring -- evaluate ATO risk at login, session activity, or transaction time and return a score within a strict latency budget
Multi-signal feature engineering -- combine behavioral signals (login patterns, device fingerprints, IP reputation, geolocation) with historical account data
Adaptive model retraining -- support periodic and triggered retraining as attacker tactics evolve over time
Tiered response actions -- map risk scores to graduated responses such as allow, step-up authentication (MFA), temporary lock, or manual review
Feedback loop integration -- incorporate user-confirmed ATO reports and false positive feedback to improve the model continuously

Non-Functional

Low latency -- scoring must complete in tens of milliseconds to avoid degrading the login or transaction experience
High availability -- the risk scoring service must be always-on; downtime means either blocking all logins or letting attackers through
Scalability -- handle spikes in login volume (e.g., credential stuffing attacks generating millions of attempts)
Privacy compliance -- handle sensitive user data (IP addresses, device info, location) in compliance with data retention and privacy regulations
Monitoring and drift detection -- track model accuracy, feature distributions, and attacker pattern shifts in production

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Model Design and Feature Engineering (Most Emphasized)

Interviewers spend roughly 20 minutes on this area. They want to see a rich, well-structured feature set and a thoughtful model choice.

Hints to consider:

Behavioral features: login time of day, login frequency, time since last login, typical session duration
Device and network features: device fingerprint, IP address reputation, geolocation, IP-to-account velocity
Historical features: rolling count of failed logins, number of distinct devices in the past 30 days, average transaction amount
Consider both numerical and categorical features, and discuss preprocessing (missing value handling, standardization, categorical encoding)
Gradient-boosted trees are a strong baseline for tabular data; discuss when you might add a sequence model for session-level patterns
Address risks during development: data imbalance, feature redundancy, overfitting on historical attack patterns

2. System Architecture and Serving Infrastructure

Interviewers spend roughly 15 minutes probing the end-to-end system, with particular attention to the real-time serving path.

Hints to consider:

Sketch a streaming pipeline for ingesting login and session events
Include a feature store with online (low-latency) and offline (batch training) layers
Show where the model serving layer sits in the authentication flow (before the auth decision is returned)
Discuss data latency: how quickly do new signals (e.g., a just-reported stolen credential) propagate to the scoring service
Cover online model update mechanisms -- can you do warm model swaps without downtime
Include monitoring dashboards for prediction accuracy, latency percentiles, and system throughput

3. Business Impact and False Positive Trade-offs

Interviewers spend roughly 15 minutes connecting the ML system to business outcomes.

Hints to consider:

Reducing the probability of successful account takeovers directly impacts user trust and platform revenue
False positives (legitimate users locked out or forced through extra MFA) degrade user experience and increase support costs
Discuss how you set the risk threshold: too aggressive locks out good users, too lenient lets attackers in
Propose metrics for business impact: ATO rate reduction, false lockout rate, user friction index
Consider tiered responses (step-up auth vs. hard block) to reduce friction for borderline cases

4. Handling Class Imbalance (Common Follow-up)

Interviewers frequently ask follow-up questions about train/test split strategies for imbalanced data.

Hints to consider:

Oversampling minority class (SMOTE) or undersampling majority class
Stratified sampling to preserve class distribution in train/test splits
Cost-sensitive learning where misclassifying an ATO is penalized more heavily
Evaluate with precision-recall curves and PR-AUC rather than accuracy
Discuss how label quality affects imbalance -- many ATOs go unreported, leading to noisy negative labels

5. Clarifying Ambiguity and Scoping

The open-ended nature of this problem is itself a test. Interviewers watch whether you ask clarifying questions before jumping to a solution.

Hints to consider:

Ask: Is the focus on login-time risk, session anomaly detection, or post-login transaction risk?
Ask: What response actions are available (block, MFA challenge, flag for review)?
Ask: What labeled data is available and how is ATO ground truth established?
Scoping the problem well signals senior-level thinking

Practice/Stripe/Account Takeover Prediction System

Account Takeover Prediction System

System DesignMust

Problem Statement

Key Requirements

Functional

Real-time risk scoring -- evaluate ATO risk at login, session activity, or transaction time and return a score within a strict latency budget
Multi-signal feature engineering -- combine behavioral signals (login patterns, device fingerprints, IP reputation, geolocation) with historical account data
Adaptive model retraining -- support periodic and triggered retraining as attacker tactics evolve over time
Tiered response actions -- map risk scores to graduated responses such as allow, step-up authentication (MFA), temporary lock, or manual review
Feedback loop integration -- incorporate user-confirmed ATO reports and false positive feedback to improve the model continuously

Non-Functional

Low latency -- scoring must complete in tens of milliseconds to avoid degrading the login or transaction experience
High availability -- the risk scoring service must be always-on; downtime means either blocking all logins or letting attackers through
Scalability -- handle spikes in login volume (e.g., credential stuffing attacks generating millions of attempts)
Privacy compliance -- handle sensitive user data (IP addresses, device info, location) in compliance with data retention and privacy regulations
Monitoring and drift detection -- track model accuracy, feature distributions, and attacker pattern shifts in production

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Model Design and Feature Engineering (Most Emphasized)

Interviewers spend roughly 20 minutes on this area. They want to see a rich, well-structured feature set and a thoughtful model choice.

Hints to consider:

Behavioral features: login time of day, login frequency, time since last login, typical session duration
Device and network features: device fingerprint, IP address reputation, geolocation, IP-to-account velocity
Historical features: rolling count of failed logins, number of distinct devices in the past 30 days, average transaction amount
Consider both numerical and categorical features, and discuss preprocessing (missing value handling, standardization, categorical encoding)
Gradient-boosted trees are a strong baseline for tabular data; discuss when you might add a sequence model for session-level patterns
Address risks during development: data imbalance, feature redundancy, overfitting on historical attack patterns

2. System Architecture and Serving Infrastructure

Interviewers spend roughly 15 minutes probing the end-to-end system, with particular attention to the real-time serving path.

Hints to consider:

Sketch a streaming pipeline for ingesting login and session events
Include a feature store with online (low-latency) and offline (batch training) layers
Show where the model serving layer sits in the authentication flow (before the auth decision is returned)
Discuss data latency: how quickly do new signals (e.g., a just-reported stolen credential) propagate to the scoring service
Cover online model update mechanisms -- can you do warm model swaps without downtime
Include monitoring dashboards for prediction accuracy, latency percentiles, and system throughput

3. Business Impact and False Positive Trade-offs

Interviewers spend roughly 15 minutes connecting the ML system to business outcomes.

Hints to consider:

Reducing the probability of successful account takeovers directly impacts user trust and platform revenue
False positives (legitimate users locked out or forced through extra MFA) degrade user experience and increase support costs
Discuss how you set the risk threshold: too aggressive locks out good users, too lenient lets attackers in
Propose metrics for business impact: ATO rate reduction, false lockout rate, user friction index
Consider tiered responses (step-up auth vs. hard block) to reduce friction for borderline cases

4. Handling Class Imbalance (Common Follow-up)

Interviewers frequently ask follow-up questions about train/test split strategies for imbalanced data.

Hints to consider:

Oversampling minority class (SMOTE) or undersampling majority class
Stratified sampling to preserve class distribution in train/test splits
Cost-sensitive learning where misclassifying an ATO is penalized more heavily
Evaluate with precision-recall curves and PR-AUC rather than accuracy
Discuss how label quality affects imbalance -- many ATOs go unreported, leading to noisy negative labels

5. Clarifying Ambiguity and Scoping

The open-ended nature of this problem is itself a test. Interviewers watch whether you ask clarifying questions before jumping to a solution.

Hints to consider:

Ask: Is the focus on login-time risk, session anomaly detection, or post-login transaction risk?
Ask: What response actions are available (block, MFA challenge, flag for review)?
Ask: What labeled data is available and how is ATO ground truth established?
Scoping the problem well signals senior-level thinking