Design a centralized notification system that can deliver messages to users across multiple channels (email, SMS, push, in-app) with the following requirements:
Functional Requirements:
Support both individual (1:1) and bulk (1:M) notifications
Handle different notification types: critical/time-sensitive (e.g., OTP, fraud alerts) and promotional (e.g., marketing campaigns)
Provide user-level preferences to opt-in/out per channel and per notification type
Guarantee at-least-once delivery with deduplication (same event_id must not produce duplicate messages to the same user on any channel)
Offer realtime status tracking: ACCEPTED → SENT → DELIVERED → READ
Support scheduled and batch campaigns (e.g., send a promo to 10 M users at 09:00 UTC)
Scale & Performance:
1 M DAU, average 10 notifications/user/day → 10 M notifications/day sustained
Peak burst: 1 M notifications/second during flash-sale or global push campaign
P99 end-to-end latency < 1 s for critical alerts; promotional traffic can tolerate seconds
Store 5 years of notification logs for audit & analytics (≈ 9 PB raw storage)
Non-functional Requirements:
No prioritization across channels—treat email, SMS, push equally inside the system
Preserve per-user per-channel ordering (Order-Placed always arrives before Order-Shipped)
Graceful degradation under partial outages; failover between multiple providers per channel
Idempotent client API with idempotency_key to survive retries
Start your design by sketching the event pipeline: producers → Kafka → fan-out → per-channel workers. Be ready to dive into how you handle mega fan-out (a celebrity with 5 M followers posts) without doing 5 M writes on the hot path—discuss hybrid push/pull, separate store for high-indegree events, and stitching on read. Thread a unique event_id through every layer to dedupe at the dispatcher so that a user who opens the app, then receives a push, then an email about the same underlying event never feels spammed.