Practice/Datadog/Design a personal finance tracker

Design a personal finance tracker

System DesignMust

Problem Statement

Design a cloud-based personal finance tracker that allows users to connect their bank accounts and credit cards, automatically import and categorize transactions into expense types (food, rent, transport, entertainment), and view analytics dashboards showing spending trends and budget progress. The system should support millions of users, process millions of transactions per day, and provide near-real-time visibility into spending patterns while maintaining strict security and privacy controls around sensitive financial data.

Personal finance trackers like Mint or YNAB pull transaction data from external financial institutions, enrich and classify each transaction, and surface budgets, trends, alerts, and insights. Interviewers ask this to evaluate your ability to design secure and reliable data ingestion from flaky third-party APIs, build enrichment pipelines with idempotency and correctness guarantees, and serve read-heavy analytics with low latency. They also test your reasoning about privacy, encryption, eventual consistency, and how you evolve a categorization system using both rules and machine learning with user feedback.

Key Requirements

Functional

Account connection and sync -- users securely connect bank and credit card accounts through aggregation providers and keep transactions synchronized automatically
Transaction management -- import, display, search, and manually edit transactions with automatic categorization into predefined and custom spending types
Budget tracking and alerts -- users create monthly or annual budgets per category and receive notifications when spending approaches or exceeds configured limits
Analytics and insights -- dashboards showing spending by category, trends over time, month-over-month comparisons, and personalized recommendations

Non-Functional

Scalability -- support 10 million users with 100 million transactions per month, handling bursts during peak synchronization windows
Reliability -- 99.9% uptime for core features with graceful degradation when third-party financial providers are temporarily unavailable
Latency -- dashboard loads under 500ms, transaction list pagination under 200ms, budget updates reflected within 2 seconds of new transaction arrival
Security -- encrypt financial data at rest and in transit, never store raw bank credentials, maintain audit logs, and support regulatory compliance for data deletion

What Interviewers Focus On

Based on real interview experiences at Datadog, these are the areas interviewers probe most deeply:

1. Data Ingestion and Deduplication Pipeline

Financial aggregators like Plaid or Yodlee deliver data that may arrive out of order, include duplicates, or change status (pending to posted). Getting this wrong means double-counting expenses and eroding user trust.

Hints to consider:

Use composite dedupe keys combining institution transaction ID, amount, date, and account to detect duplicates reliably
Design idempotent processing so replaying events from the aggregator does not corrupt spending totals or create phantom transactions
Handle pending transactions by storing them with a provisional flag, then updating to posted when the final record arrives using the same dedupe key
Consider edge cases like amount adjustments on pending transactions, voided charges, and refunds that reference original transactions

2. Categorization Engine and Feedback Loop

Automatically classifying transactions into spending categories is a core value proposition but difficult to get right consistently across merchants and user contexts.

Hints to consider:

Start with a rules-based engine matching merchant names and MCC codes to categories, covering the majority of common transactions
Store user overrides when someone reclassifies a transaction, and use those corrections to improve future categorization
Assign a confidence score to each categorization so the UI can surface low-confidence items for manual review
Design the feedback loop so a user correction for "Starbucks" as "Food" propagates to similar future transactions for that user

3. Real-Time Analytics and Precomputed Aggregates

Dashboards must load instantly even for users with thousands of transactions. Scanning raw transaction tables on every dashboard request is not viable at scale.

Hints to consider:

Maintain rollup tables aggregated by user, category, and time window (daily, monthly) that update incrementally as new transactions arrive
Use event-driven updates: when a new transaction is categorized (or recategorized), publish an event that triggers aggregate recalculation
Cache hot aggregates like current-month spending and budget utilization in Redis with short TTLs for sub-100ms dashboard loads
Balance freshness requirements against computational cost -- historical month aggregates are immutable and can be cached aggressively

4. Security, Encryption, and Compliance

Financial data is among the most sensitive categories of personal information. Interviewers expect a thoughtful approach to protecting it across the entire system.

Hints to consider:

Never store raw bank credentials; use OAuth tokens from aggregation providers and store them in a secrets vault with encryption at rest
Implement field-level encryption for PII (account numbers, names) and transaction details, with key rotation policies
Maintain immutable audit logs recording who accessed which accounts and when, supporting both internal review and regulatory requests
Design for GDPR-style data deletion where users can request complete purging of all their financial history and connected accounts

5. Third-Party API Reliability and Graceful Degradation

Financial aggregator APIs have variable reliability, rate limits, and data freshness guarantees. Your system must handle outages without breaking the user experience.

Hints to consider:

Decouple sync operations from user-facing requests: background workers poll aggregators on scheduled intervals or respond to webhooks
Implement circuit breakers per provider so a single provider outage does not cascade to other parts of the system
Show users a clear "last synced" timestamp and status indicator so they understand data freshness during provider outages
Use exponential backoff with jitter when retrying failed sync operations to avoid amplifying load on recovering providers

Practice/Datadog/Design a personal finance tracker

Design a personal finance tracker

System DesignMust

Problem Statement

Key Requirements

Functional

Account connection and sync -- users securely connect bank and credit card accounts through aggregation providers and keep transactions synchronized automatically
Transaction management -- import, display, search, and manually edit transactions with automatic categorization into predefined and custom spending types
Budget tracking and alerts -- users create monthly or annual budgets per category and receive notifications when spending approaches or exceeds configured limits
Analytics and insights -- dashboards showing spending by category, trends over time, month-over-month comparisons, and personalized recommendations

Non-Functional

Scalability -- support 10 million users with 100 million transactions per month, handling bursts during peak synchronization windows
Reliability -- 99.9% uptime for core features with graceful degradation when third-party financial providers are temporarily unavailable
Latency -- dashboard loads under 500ms, transaction list pagination under 200ms, budget updates reflected within 2 seconds of new transaction arrival
Security -- encrypt financial data at rest and in transit, never store raw bank credentials, maintain audit logs, and support regulatory compliance for data deletion

What Interviewers Focus On

Based on real interview experiences at Datadog, these are the areas interviewers probe most deeply:

1. Data Ingestion and Deduplication Pipeline

Hints to consider:

Use composite dedupe keys combining institution transaction ID, amount, date, and account to detect duplicates reliably
Design idempotent processing so replaying events from the aggregator does not corrupt spending totals or create phantom transactions
Handle pending transactions by storing them with a provisional flag, then updating to posted when the final record arrives using the same dedupe key
Consider edge cases like amount adjustments on pending transactions, voided charges, and refunds that reference original transactions

2. Categorization Engine and Feedback Loop

Automatically classifying transactions into spending categories is a core value proposition but difficult to get right consistently across merchants and user contexts.

Hints to consider:

Start with a rules-based engine matching merchant names and MCC codes to categories, covering the majority of common transactions
Store user overrides when someone reclassifies a transaction, and use those corrections to improve future categorization
Assign a confidence score to each categorization so the UI can surface low-confidence items for manual review
Design the feedback loop so a user correction for "Starbucks" as "Food" propagates to similar future transactions for that user

3. Real-Time Analytics and Precomputed Aggregates

Dashboards must load instantly even for users with thousands of transactions. Scanning raw transaction tables on every dashboard request is not viable at scale.

Hints to consider:

Maintain rollup tables aggregated by user, category, and time window (daily, monthly) that update incrementally as new transactions arrive
Use event-driven updates: when a new transaction is categorized (or recategorized), publish an event that triggers aggregate recalculation
Cache hot aggregates like current-month spending and budget utilization in Redis with short TTLs for sub-100ms dashboard loads
Balance freshness requirements against computational cost -- historical month aggregates are immutable and can be cached aggressively

4. Security, Encryption, and Compliance

Financial data is among the most sensitive categories of personal information. Interviewers expect a thoughtful approach to protecting it across the entire system.

Hints to consider:

Never store raw bank credentials; use OAuth tokens from aggregation providers and store them in a secrets vault with encryption at rest
Implement field-level encryption for PII (account numbers, names) and transaction details, with key rotation policies
Maintain immutable audit logs recording who accessed which accounts and when, supporting both internal review and regulatory requests
Design for GDPR-style data deletion where users can request complete purging of all their financial history and connected accounts

5. Third-Party API Reliability and Graceful Degradation

Financial aggregator APIs have variable reliability, rate limits, and data freshness guarantees. Your system must handle outages without breaking the user experience.

Hints to consider:

Decouple sync operations from user-facing requests: background workers poll aggregators on scheduled intervals or respond to webhooks
Implement circuit breakers per provider so a single provider outage does not cascade to other parts of the system
Show users a clear "last synced" timestamp and status indicator so they understand data freshness during provider outages
Use exponential backoff with jitter when retrying failed sync operations to avoid amplifying load on recovering providers