Practice/Netflix/Design Netflix Subscription Billing
Design Netflix Subscription Billing
System DesignOptional
Problem Statement
You are tasked with designing a subscription billing engine for a global streaming platform serving 250 million paying subscribers across dozens of countries. The system must process recurring charges monthly or annually, handle plan changes (upgrades, downgrades, cancellations) with fair proration, support multiple payment methods and currencies, and gracefully manage payment failures through retry logic and customer notifications.
The billing system needs to coordinate with payment service providers, maintain accurate financial records for compliance and auditing, send receipts and invoices, and control subscriber access based on payment status. Your design must ensure that no customer is ever double-charged, that failed payments are retried intelligently, and that the system can scale to process tens of millions of billing events during peak renewal windows without overwhelming downstream services or creating inconsistent financial state.
Key Requirements
Functional
- Subscription lifecycle management -- users can create, upgrade, downgrade, pause, or cancel subscriptions with appropriate proration calculations
- Multi-payment method support -- users can store multiple payment methods (credit cards, PayPal, bank transfers) and set a default for recurring billing
- Automated billing cycle execution -- the system charges subscriptions on their renewal date and updates access entitlements accordingly
- Payment failure handling -- failed charges trigger configurable retry schedules with customer notifications and eventual account suspension
- Invoice generation -- detailed invoices with line items, taxes, discounts, and credits are produced and stored for every billing event
- Billing history access -- users can view past invoices, payment attempts, and subscription changes through a portal
Non-Functional
- Scalability -- process 10+ million billing transactions per day with peak loads of 500K+ transactions per hour during monthly renewal cycles
- Reliability -- ensure exactly-once financial semantics; no duplicate charges or missed billing under network failures or service restarts
- Latency -- interactive operations (plan changes, payment method updates) complete within 2 seconds; background billing jobs process accounts within their renewal window
- Consistency -- maintain strong consistency for financial records (invoices, ledger, payment attempts) while allowing eventual consistency for non-critical data
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Idempotency and Exactly-Once Financial Semantics
Your design must prevent double-charging customers even when network failures cause retries or when multiple workers attempt to process the same subscription renewal. Interviewers want to see how you achieve correctness in a distributed system handling real money.
Hints to consider:
- Generate and persist unique idempotency keys (invoice ID + payment attempt number) before calling payment processors
- Use database transactions to atomically record payment attempts and prevent concurrent processing of the same subscription
- Implement webhook deduplication using event IDs and database constraints to handle at-least-once delivery from payment providers
- Design reconciliation jobs that compare internal ledger state with payment provider records to catch and fix discrepancies
2. Distributing Load and Avoiding Thundering Herds
With millions of subscriptions renewing, you cannot process all charges at midnight on the first of the month. Interviewers expect you to discuss strategies for spreading load over time and protecting rate-limited downstream systems.
Hints to consider:
- Hash subscription IDs to distribute billing times uniformly across a 24-48 hour renewal window
- Implement per-payment-provider rate limiting and backpressure using token bucket algorithms in a shared cache
- Use message queues with configurable consumer parallelism to control throughput and allow graceful degradation
- Consider time zone awareness and user preferences to balance load while maintaining predictable billing times
3. Multi-Step Workflow Orchestration Across Services
Billing involves coordinating multiple services: invoice creation, payment authorization/capture, entitlement updates, receipt delivery, and ledger posting. Each step can fail independently, and the entire workflow must be resilient and auditable.
Hints to consider:
- Use a durable workflow engine or saga pattern to model billing as a series of compensatable steps with retry logic
- Store workflow state in a database so it can resume after crashes without losing progress or duplicating side effects
- Emit domain events (invoice created, payment captured, access granted) to Kafka for async consumers rather than synchronous coupling
- Design each step to be idempotent by checking existing state before applying changes
4. Proration Logic and Plan Change Handling
When users upgrade or downgrade mid-cycle, you must calculate fair charges or credits. Interviewers probe whether you understand the business complexity and can maintain correctness under concurrent changes.
Hints to consider:
- Lock the subscription record when processing a plan change to prevent race conditions with background billing jobs
- Calculate unused time on the old plan and prorate charges or issue credits to the customer's account balance
- Generate adjustment line items on invoices to provide transparency into proration calculations
- Handle edge cases like multiple plan changes within a billing period or changes immediately before renewal
5. Payment Failure Recovery and Dunning
Not all charges succeed on the first try. Your system must retry intelligently, notify users, and eventually suspend access without creating a poor user experience or losing recoverable revenue.
Hints to consider:
- Implement exponential backoff retry schedules (e.g., 1 day, 3 days, 7 days) with configurable limits per subscription tier
- Send progressive notifications (reminder, warning, final notice) via email and in-app messages as retries occur
- Provide a grace period where access continues despite failed payment to reduce involuntary churn
- Allow users to update payment methods even after failure and automatically retry the charge upon update
Suggested Approach
Step 1: Clarify Requirements
Confirm the following with your interviewer before sketching architecture:
- What subscription tiers and billing frequencies are supported (monthly, annual, quarterly)?
- How should proration work for upgrades, downgrades, and mid-cycle cancellations?
- What payment methods and regions must be supported, and are there specific payment processors to integrate?
- What are the retry policies for failed payments, and when should access be suspended?
- Are there requirements for tax calculation, discounts, promotional codes, or partner billing?
- What compliance and auditing requirements exist (PCI-DSS, SOX, GDPR)?
- Should the system support multi-currency billing and currency conversion?