[ OK ]a26a8446-58e9-4c23-a802-1f655e1af6e8 — full content available
[ INFO ]category: System Design difficulty: unknown freq: first seen: 2026-03-13
[UNKNOWN][SYSTEM DESIGN]
$catproblem.md
In an OpenAI system design interview, the Webhook Delivery System problem statement typically asks you to design a reliable, scalable, and secure platform that allows source applications to automatically notify third-party subscribers when specific events occur. 2
Core Problem Statement
"Design a webhook delivery system that can handle high-volume event fan-out (e.g., 1 billion events per day) with reliable delivery guarantees and multi-tenant isolation." 45
Functional Requirements
Registration: Users must be able to register, update, and delete callback URLs associated with specific event types.
Event Ingestion: The system must listen for internal events (e.g., a payment success or a completed AI batch job).
Reliable Delivery: The system must provide at-least-once delivery guarantees.
Filtering: Support for event subscriptions so clients only receive the data they care about. YouTube +3
Non-Functional Requirements
High Scalability: Ability to handle massive spikes in event volume (e.g., 100k events/sec at peak).
Fault Tolerance & Retries: Automatic retries with exponential backoff for failed delivery attempts.
Security: Implement payload signing (HMAC) so recipients can verify the request came from your platform.
Idempotency: Ensuring that multiple deliveries of the same event do not cause duplicate actions on the receiver's end.
Multi-tenant Isolation: Prevent a single "noisy neighbor" (one client's failing or slow server) from clogging the entire delivery pipeline. YouTube +6
Key Components to Discuss
Ingestion Service: Validates incoming internal events and pushes them to a message queue.
Message Queue/Bus: Used for decoupling and buffering high-volume traffic (e.g., Kafka or SQS).
Delivery Workers: Consume events from the queue and perform the actual HTTP POST requests to third-party endpoints.
Dead Letter Queue (DLQ): Stores events that have failed all retry attempts for later manual inspection.
State Store/Database: Tracks subscription metadata and delivery history for observability. Hello Interview +2
Would you like me to walk through a high-level architecture diagram or dive deeper into the retry strategy for this system?