For a full example answer with detailed architecture diagrams and deep dives, see our Design a Job Scheduler guide. While the job scheduler guide focuses on task execution, many of the same patterns around priority queuing, durable workflows, and rate limiting apply directly to notification delivery pipelines.
Also review the Message Queues and Rate Limiters building blocks for background on asynchronous processing and throughput control.
Design a notification system that delivers both critical time-sensitive alerts (such as direct messages, OTPs, and security warnings) and promotional campaign messages (such as content recommendations and marketing announcements) across push, SMS, and email channels. The platform must process up to one million notifications per second at peak load, where roughly 80 percent of traffic is critical and 20 percent is promotional. Promotional notifications can target thousands or millions of users simultaneously but must never arrive after their expiration window closes.
Think of platforms like OneSignal, Firebase Cloud Messaging, or the notification infrastructure behind large consumer apps. Product teams submit notification requests through an API, specifying priority, audience, expiration, and channel preferences. The backend classifies traffic, expands audience segments, enforces user preferences and quiet hours, rate-limits against provider quotas, and dispatches messages with at-least-once reliability. Your design must guarantee that bulk promotional campaigns never degrade the delivery latency of critical alerts.
Based on real interview experiences, these are the areas interviewers probe most deeply:
A single shared queue will cause head-of-line blocking when a million-user campaign floods the system, delaying critical alerts and violating latency SLOs. Interviewers want to see explicit isolation between priority classes.
Hints to consider:
Delivering an expired promotion erodes user trust and may violate compliance rules. A single check at ingestion is insufficient because messages can sit in queues for unpredictable durations.
Hints to consider:
A single campaign API call targeting millions of users must not block the ingestion pipeline or overload the database. Interviewers expect an asynchronous, chunked expansion strategy.
Hints to consider:
At-least-once retry semantics combined with fan-out can produce duplicates. Users receiving the same OTP or marketing message twice degrades the experience.
Hints to consider:
Confirm the definition of critical versus promotional traffic and the acceptable latency targets for each. Ask whether the system is multi-tenant and if different customers need isolated quotas. Verify whether multi-region deployment is required or if a single datacenter suffices. Clarify retry policies: how many attempts, over what time window, and whether retries should fall back to alternative channels. Confirm the size of the largest expected campaign and whether audience expansion happens before or after the campaign is submitted. Establish whether you need delivery receipts and engagement tracking or if fire-and-forget is acceptable.
Sketch an ingestion API layer that accepts notification requests and immediately writes them to a partitioned message log. Use separate Kafka topics for critical and promotional streams to enforce traffic isolation at the infrastructure level. Deploy consumer groups that pull from these topics and perform audience expansion, preference lookups, and provider-specific formatting. Place a rate-limiting layer in front of provider gateways that tracks quota per provider and per tenant using Redis token buckets. Include a retry service that requeues failed messages with exponential backoff and checks expiration timestamps before each attempt. Store notification metadata and delivery status in a horizontally scalable database like Cassandra for auditing and analytics.
Walk through how a promotional campaign flows end-to-end. The API accepts the campaign, assigns an expiration timestamp, and writes a single campaign job record to the promotional Kafka topic. A fleet of expansion workers picks up the job, queries user segments in batches, and emits individual user-message pairs to a downstream dispatch topic. A second tier of formatting workers reads from this topic, fetches user preferences from a cache-backed store, enriches the message payload per channel, and forwards to the provider gateway. Critical alerts bypass the expansion step entirely and go directly from the ingestion API to the formatting workers via the high-priority topic. Partition both topics by user ID to maintain per-user ordering and enable parallel processing. Emphasize that critical and promotional consumer groups scale independently and that promotional workers can be throttled or paused during overload without affecting critical delivery.
Discuss reliability: use Kafka replication and consumer offset commits to prevent data loss, and configure dead-letter queues for messages that fail after maximum retries. Cover rate limiting: implement token bucket counters per provider in Redis with atomic decrement operations, and queue overflow messages in a sorted set for retry after the rate window resets. Address observability: emit metrics for queue depth per priority tier, per-priority latency percentiles, provider error rates, and expiration-drop counts to detect issues before they impact SLOs. Mention multi-region considerations if the user base is global, with regional message logs and provider gateways to reduce cross-region latency.
"Design Notification system for Simple and Bulk notification."
"Required to share the API endpoints, Database design and Contracts for the components."
"Design Notification System for sending otps or emails."