Practice/Lyft/Design a Donations Website
Design a Donations Website
System DesignMust
Problem Statement
Design a time-limited online fundraising platform that operates for 24-48 hours during a special charity event. During this window, thousands of users visit the site simultaneously to donate money to one of several pre-approved charitable organizations. The platform must display real-time fundraising progress for each charity and aggregate totals, process secure payment transactions, and ensure that no donation is lost or double-charged even when payment providers experience outages or network failures.
This problem is particularly relevant at Lyft because it combines payment processing (a core concern for any ride-sharing platform) with real-time data aggregation and time-bounded operational constraints. The interviewer wants to see how you handle the unique challenge of a system that must work perfectly during a short, high-stakes window with no opportunity for gradual rollout or extended debugging.
The system must manage three distinct lifecycle phases: a pre-event countdown page, an active donation flow during the event window, and a post-event summary showing final totals and thank-you messaging. Each phase has different traffic patterns and consistency requirements, and the transition between phases must be seamless.
Key Requirements
Functional
- Charity browsing -- Users view a curated list of participating charities with descriptions, images, and live donation totals
- Secure donation flow -- Users complete a multi-step checkout to contribute a specified amount to their chosen charity, receiving confirmation and a digital receipt
- Live progress tracking -- Real-time counters show total funds raised per charity and overall, updating within seconds as donations are processed
- Event time-gating -- The site displays different content based on event state: pre-event countdown, active donation window, and post-event final results
Non-Functional
- Scalability -- Support 10,000 concurrent users with 500 donations per minute during peak periods
- Reliability -- Zero tolerance for lost or duplicate donations; the system must handle payment provider outages and webhook delivery failures
- Latency -- Donation submission completes within 3 seconds; live counter updates reflect within 5 seconds of payment confirmation
- Consistency -- Strong consistency for payment records and donation amounts; eventual consistency acceptable for public-facing counters and leaderboards
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Payment Idempotency and State Management
Payment flows involve multiple asynchronous steps across your service and external payment processors. Interviewers want to see how you prevent duplicate charges when users refresh, requests time out, or webhooks arrive multiple times.
Hints to consider:
- Generate server-side idempotency keys before calling the payment provider so retries are safe by construction
- Model the donation lifecycle as a finite state machine with monotonic transitions:
pending to authorized to captured to completed
- Store webhook event IDs with unique database constraints to reject duplicate payment confirmations
- Consider how client retries, server retries, and webhook retries all interact with your state machine without causing double-charges
2. Payment Provider Resilience
External payment services experience downtime, slow responses, and partial outages. Interviewers expect fallback strategies that maintain availability without compromising payment integrity.
Hints to consider:
- Implement circuit breakers with tuned thresholds to fail fast when the provider is degraded rather than queuing thousands of hanging requests
- Design a graceful degradation path: queue donations for later processing or offer a backup payment provider
- Distinguish carefully between "payment unknown" (timeout) and "payment failed" (explicit rejection) scenarios
- Plan for manual reconciliation when automated recovery is not possible during the short event window
3. Hot Counter Contention
When thousands of users donate simultaneously, naively incrementing a single database row for totals creates lock contention and query timeouts. Interviewers want write-heavy counter strategies.
Hints to consider:
- Use Redis
INCRBY for atomic, high-throughput counter updates that absorb burst traffic
- Shard counters by charity ID or time bucket to distribute write load across multiple keys
- Periodically reconcile fast Redis counters with authoritative PostgreSQL records to detect drift
- Consider write-behind patterns where counter updates are batched and flushed asynchronously to the database
4. Real-Time Update Distribution
The platform must push live donation totals to thousands of connected browsers without overwhelming the backend or serving stale data.
Hints to consider:
- Use Server-Sent Events (SSE) or WebSockets to push counter updates to connected clients
- Fan out updates through a Redis pub/sub channel so application servers subscribe and forward to their connected clients
- Implement client-side exponential backoff with jitter to smooth reconnection storms after brief outages
- Cache current totals at the edge or in a CDN for users who just loaded the page and do not need sub-second freshness