Practice/Stripe/Design a Merchant Payout System
Design a Merchant Payout System
System DesignOptional
Problem Statement
Design a real-time auction platform that supports thousands of concurrent auctions with millions of active bidders. The system must process bid submissions with sub-second latency, enforce auction rules (minimum bid increments, reserve prices, time extensions), prevent double-bidding and race conditions, and notify participants of bid updates in real time. After each auction closes, the system should handle payment processing for winners, settlement with sellers, and maintain a complete audit trail of all bidding activity.
The core challenge lies in achieving strong consistency for bid validation while delivering real-time updates to all watchers, handling traffic spikes during popular auction endings, and ensuring financial correctness when multiple bids arrive simultaneously. Your design must scale to support 50,000 active auctions at peak times with an average of 200 watchers per auction and bid submission rates reaching 5,000 bids per second during high-activity periods.
Key Requirements
Functional
- Bid submission and validation -- accept bids, enforce minimum increments, verify bidder eligibility, and reject invalid attempts with immediate feedback
- Real-time auction state broadcasting -- push current high bid, bidder count, and time remaining to all watchers with sub-second latency
- Auction lifecycle management -- support scheduled starts, automatic time extensions when last-second bids arrive, and definitive close events
- Post-auction settlement -- charge winning bidders, transfer funds to sellers minus platform fees, and handle payment failures with retry logic
- Bidding history and audit logs -- maintain immutable records of every bid attempt, outcome, timestamp, and state transition for compliance and dispute resolution
Non-Functional
- Scalability -- handle 50,000 concurrent auctions, 10 million connected viewers, and 5,000 bids per second at peak
- Reliability -- ensure no lost bids, guarantee exactly-once charge semantics, and provide 99.95% uptime for bidding services
- Latency -- process bid validation in under 200ms, deliver real-time updates to viewers within 500ms, and complete auction close within 2 seconds
- Consistency -- enforce strict serializability for bid ordering within each auction while allowing eventual consistency for view counts and non-critical metrics
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Handling Concurrent Bids and Race Conditions
Multiple bidders often submit bids simultaneously in the final seconds of an auction. Interviewers want to see how you serialize conflicting bids, determine the winning bid atomically, and prevent double-wins or lost updates. This tests your understanding of distributed consensus, optimistic vs. pessimistic locking, and transaction isolation levels.
Hints to consider:
- Consider partitioning auctions by ID and routing all bids for an auction to a single authoritative node that sequences them in memory
- Use versioned writes with compare-and-swap semantics in your database to detect concurrent modifications
- Discuss the tradeoff between single-leader (strong consistency, potential bottleneck) and multi-leader (higher throughput, complex conflict resolution) approaches
- Explore how optimistic concurrency control with retry logic can reduce lock contention while maintaining correctness
2. Real-Time Fan-Out to Millions of Watchers
Each auction may have thousands of viewers who need instant updates when bids arrive. Broadcasting every bid to millions of WebSocket connections creates massive fan-out. Interviewers look for hierarchical distribution strategies, smart filtering, and infrastructure choices that prevent message storms from overwhelming your system.
Hints to consider:
- Use a tiered pub-sub architecture where auction services publish to topics and regional WebSocket clusters subscribe and fan out locally
- Implement throttling and batching for high-frequency updates so viewers receive aggregated state every 300-500ms rather than per-bid messages
- Consider CRDT-based counters for non-critical metrics like view counts that can tolerate eventual consistency
- Discuss how to handle reconnection storms when thousands of clients reconnect after a network partition
3. Time Extension Logic and Auction Close Finality
Many auctions extend their end time if bids arrive in the final minutes to prevent sniping. This dynamic deadline creates complexity: how do you communicate extensions, prevent infinite loops, and definitively close an auction when hundreds of bids queue up? Interviewers probe your state machine design and how you handle the transition from accepting bids to finalizing winners.
Hints to consider:
- Model auction state as a finite state machine with explicit transitions (ACTIVE → EXTENDED → CLOSING → CLOSED) to avoid ambiguous states
- Use a sliding window approach where each late bid adds fixed time (e.g., 2 minutes) up to a maximum extension limit
- Implement a two-phase close where you stop accepting new bids, drain in-flight requests, then atomically finalize the winner
- Discuss idempotency for close operations so retries or duplicate messages don't trigger multiple settlement attempts
4. Payment Processing and Settlement Workflow
After an auction closes, you must charge the winner, handle payment failures, retry with backoff, and transfer funds to the seller. This multi-step process crosses service boundaries and external payment gateways. Interviewers want to see workflow orchestration, idempotency handling, and failure recovery strategies that prevent double charges or lost funds.
Hints to consider:
- Use a saga pattern or workflow engine to coordinate the charge-winner, deduct-fees, pay-seller sequence with compensating transactions for rollback
- Store idempotency keys for each payment operation to safely retry API calls to external gateways without duplicate charges
- Design separate ledger entries for every fund movement (charge, fee, payout) to maintain an audit trail and enable reconciliation
- Discuss how to handle scenarios where the winner's payment fails (re-auction, offer to second bidder, or cancel) and communicate outcomes to all parties
5. Data Partitioning and Hot Auction Handling
Popular auctions (celebrity items, rare collectibles) attract extreme traffic concentration. A single auction might receive 50% of system-wide bids for several minutes. Interviewers assess whether your partitioning strategy handles skew, how you prevent a hot partition from degrading the entire system, and whether you can dynamically scale resources for viral auctions.
Hints to consider:
- Partition auction data by auction ID for write isolation, but consider read replicas or caching layers for hot auction reads
- Use consistent hashing with virtual nodes so you can split or migrate individual hot auctions to dedicated hardware without full rebalancing
- Implement rate limiting and bidder queues for extremely hot auctions to smooth traffic spikes and prevent cascading failures
- Discuss circuit breakers and bulkheads that isolate hot auction failures from affecting the rest of the platform
Suggested Approach
Step 1: Clarify Requirements
Confirm the expected scale, auction types, and business rules with the interviewer before designing:
- What is the typical auction duration (hours, days, weeks) and how many extend vs. fixed-end auctions?
- Should the system support reserve prices, buy-it-now options, or only standard ascending bids?
- What is the expected bid-to-watcher ratio and peak concurrency during high-profile auctions?
- How strict are financial consistency requirements -- can you tolerate brief display lag for non-winners?
- What payment methods must be supported and do you handle currency conversion or international transfers?
- Are there regulatory requirements for audit logs, data retention, or fraud detection?