Design a webhook delivery service for a payments platform that notifies merchants when payment events occur. When a customer completes a payment, receives a refund, or disputes a charge, the platform needs to reliably deliver these events to the merchant's server in real-time.
Consider the workflow: an online store integrates with a payment platform to process transactions. The store needs to know immediately when payments succeed so they can fulfill orders, when refunds are processed so they can update inventory, and when chargebacks occur so they can respond appropriately. A webhook system enables this real-time notification by sending HTTP POST requests to merchant-configured endpoints. Merchants register their callback URLs, subscribe to specific event types, and your system guarantees delivery even when merchant servers experience downtime.
Merchant registration -- allow merchants to register webhook endpoints and specify which event types they want to receive (e.g., payment.success, refund.created)
Event delivery -- when a payment event occurs, send an HTTP POST request with the event payload to all registered webhooks
Reliable delivery -- ensure events are delivered even when merchant servers are temporarily unavailable, using retry with exponential backoff
Signature verification -- sign every webhook request so merchants can authenticate it came from the platform
Endpoint validation -- verify that merchants own the URL they register before sending payment data to it
At-least-once delivery -- every event must be delivered at least once; duplicates are acceptable if merchants can deduplicate
Low latency -- minimize time between a payment event occurring and the webhook being delivered
Horizontal scalability -- handle millions of events daily across hundreds of thousands of merchants
Tenant isolation -- one merchant's broken endpoint must not affect delivery to other merchants
Based on real interview experiences, these are the areas interviewers probe most deeply:
Given Stripe's focus on reliability, interviewers spend the most time on how you ensure every event reaches the merchant. They walk through failure scenarios in detail.
Exponential backoff with jitter (e.g., 1s, 2s, 4s, 8s... up to a maximum of 24-72 hours)
Store delivery attempts in a database with status tracking per event per endpoint
Use a persistent message queue (e.g., SQS, Kafka) so events survive service restarts
After exhausting retries, mark the endpoint as unhealthy and notify the merchant via email or dashboard alert
Interviewers want to see that you understand at-least-once delivery means merchants will receive duplicates, and you must give them tools to handle it.
How do merchants verify that requests actually came from your platform, and how do you prevent your webhook system from being weaponized?
HMAC-SHA256 signature using a per-merchant secret key, included in a header like X-Signature
Include a timestamp in the signed payload to prevent replay attacks
SSRF prevention: block private IP ranges (10.x.x.x, 172.16.x.x, 192.168.x.x, 127.0.0.1), validate DNS resolution before sending
Endpoint verification: send a challenge request during registration that the merchant must echo back
How do you prevent a single merchant with millions of pending events or a slow endpoint from starving other merchants?
Per-merchant delivery queues so one merchant's backlog does not block others
Rate limiting per merchant endpoint (e.g., max 100 concurrent deliveries per merchant)
Circuit breaker pattern: if a merchant endpoint fails N times consecutively, stop sending and mark it disabled
Separate fast and slow lanes -- route merchants with healthy endpoints through a fast path
Interviewers may ask whether events can arrive out of order and how merchants should handle this.
Webhooks are inherently unordered due to retries and concurrent delivery
Include a monotonically increasing sequence number or event timestamp so merchants can detect out-of-order delivery
Merchants should fetch the latest state from the API rather than relying solely on webhook event ordering
For critical ordering needs, consider a single-threaded-per-merchant delivery model at the cost of throughput