Practice/Databricks/Design a visa payment network system for transaction processing
Design a visa payment network system for transaction processing
System DesignMust
Problem Statement
Design a high-frequency stock trading platform that enables users to buy and sell equities in real time while ensuring order integrity, fair market execution, and regulatory compliance. The system must handle millions of orders per second during peak trading hours, match buyers with sellers efficiently, maintain an accurate order book, and provide instant confirmation to traders while guaranteeing that no order is lost or executed twice.
This is a latency-critical, write-heavy system where correctness is paramount -- incorrect order matching or duplicate executions can result in massive financial losses and regulatory penalties. The platform must handle market volatility, flash crashes, and coordinated surges in trading activity while maintaining sub-millisecond matching latency for active securities. Your design should account for multiple trading venues, circuit breakers, audit trails, and settlement processes that occur after market close.
Key Requirements
Functional
- Order placement and cancellation -- traders submit market orders, limit orders, stop orders, and can cancel pending orders before execution
- Order matching -- the system matches buy and sell orders based on price-time priority, executing trades when conditions are met
- Real-time order book -- maintain a live view of pending buy/sell orders at each price level for all actively traded securities
- Position and balance tracking -- track each user's holdings, available cash, and buying power in real time, preventing overselling
- Trade history and confirmations -- provide immediate execution confirmations and maintain a complete audit trail of all orders and trades
- Settlement and clearing -- coordinate with clearinghouses to settle trades (typically T+2) and transfer ownership of securities
Non-Functional
- Scalability -- handle 10 million orders per second across 5,000 actively traded symbols during peak hours
- Reliability -- achieve 99.99% uptime during trading hours with zero data loss; support failover within 100ms
- Latency -- match and acknowledge orders in under 1ms for liquid securities; provide sub-10ms end-to-end response to traders
- Consistency -- guarantee exactly-once order execution; prevent overselling, race conditions, and stale order book data
- Fairness -- ensure price-time priority; all traders see the same market data with minimal skew (within 10ms)
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Order Matching Engine Architecture
The matching engine is the heart of the system and the most complex component. Interviewers want to see if you understand the tradeoffs between latency, throughput, and correctness when designing a lock-free, deterministic matcher.
Hints to consider:
- Partition order books by symbol to eliminate cross-symbol contention and enable parallel matching
- Use in-memory data structures (priority queues, red-black trees) with memory-mapped files or persistent logs for durability
- Discuss single-threaded event loops per symbol versus multi-threaded designs with fine-grained locking
- Consider how to handle order amendments (cancel-replace) atomically without race conditions
2. Exactly-Once Execution and Idempotency
Financial systems cannot tolerate duplicate executions or lost orders. Your design must prevent retry storms, network glitches, and client bugs from causing double-charges or missed trades.
Hints to consider:
- Assign globally unique, monotonic order IDs at ingestion with deduplication windows in a fast cache layer
- Use deterministic request identifiers (client-generated idempotency keys) to detect retries before they reach the matcher
- Maintain a write-ahead log (WAL) of all state transitions so the matcher can recover to the exact last committed state
- Discuss how to handle late-arriving duplicates after the original order has been filled or canceled
3. Handling Market Volatility and Overload
During flash crashes or major news events, order volume can spike 100x within seconds. Interviewers look for backpressure, circuit breakers, and graceful degradation strategies.
Hints to consider:
- Implement per-user and per-symbol rate limiting to prevent abuse and ensure fair access during spikes
- Use circuit breakers (halt trading) when price moves exceed thresholds or order imbalance is extreme
- Design an admission control layer that can shed load or reject non-critical order types (e.g., info queries) during overload
- Discuss queueing strategies: should you buffer orders in Kafka or reject immediately with "system busy" errors?
4. Real-Time Order Book Distribution
Thousands of traders need up-to-the-millisecond order book snapshots to make informed decisions. Broadcasting every change to every client is cost-prohibitive and creates thundering herd problems.
Hints to consider:
- Aggregate order book updates into time-sliced snapshots (e.g., every 10ms) to reduce message volume
- Use multicast or pub/sub (Kafka, NATS, WebSockets) to push updates to subscribers rather than polling
- Differentiate between professional traders (full depth) and retail users (top-of-book only) to reduce bandwidth
- Discuss eventual consistency: how stale can the client view be before it's "unfair" or violates regulations?
5. Multi-Region Availability and Disaster Recovery
Trading platforms must survive datacenter failures, network partitions, and hardware faults without losing orders or corrupting state. Geographic distribution adds complexity due to speed-of-light latency.
Hints to consider:
- Use a primary-secondary model per symbol with Raft or ZooKeeper for leader election and failover within milliseconds
- Replicate the write-ahead log synchronously to a hot standby in the same region, asynchronously to remote regions
- Discuss tradeoffs: synchronous cross-region writes add 20-50ms latency; async risks data loss on regional failure
- Plan for split-brain scenarios: partition tolerance might mean halting trading rather than allowing inconsistent matches
Suggested Approach
Step 1: Clarify Requirements
Confirm the scope and constraints with your interviewer:
- What types of orders must be supported (market, limit, stop-loss, iceberg)?
- How many symbols and what order volume per symbol (some are "hot," others barely trade)?
- Are we building a single exchange or aggregating liquidity from multiple venues?
- What are the SLAs for order acknowledgment, matching latency, and data freshness?
- Do we need pre-trade risk checks (margin, position limits) or is that handled upstream?
- How do we handle partial fills, and are we responsible for settlement or just matching?
Clarify whether this is a retail brokerage (user-facing) or an institutional dark pool. Establish read-to-write ratios: matching is write-heavy, but market data queries dominate traffic.