For a walkthrough of scheduling and orchestrating background work at scale, see our Design Job Scheduler guide. Many of the same patterns around durable task queues, retries, and priority isolation apply directly to notification pipelines.
Also review the Message Queues, Databases, and Rate Limiters building blocks for background on asynchronous processing, storage choices, and throttle control.
Design a notification system that delivers messages to users across multiple channels -- push notifications, SMS, and email. The system must handle two distinct traffic patterns: high-priority critical alerts (authentication codes, security warnings, direct messages) that demand sub-second delivery, and bulk promotional campaigns (marketing offers, feature announcements) that can target thousands or millions of recipients but must never arrive after their expiration window closes.
The platform needs to sustain one million notifications per second at peak load, with roughly 80 percent being time-critical alerts and 20 percent being campaigns. Promotional fan-outs must never degrade the delivery speed of critical traffic. The architecture must respect third-party provider rate limits, enforce exactly-once delivery per user, maintain per-user message ordering within each priority class, and gracefully handle provider outages. You are designing the server-side pipeline only, not the client SDK or device-level delivery confirmation.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Interviewers want to see how you prevent promotional blasts from starving critical alerts. A naive single-queue design will fail under load when a million-user campaign floods the system.
Hints to consider:
Delivering an expired promotion damages user trust and may violate business rules. Interviewers look for TTL checks at every processing stage, not just at ingestion.
Hints to consider:
A single campaign API call targeting a million users must not block ingestion or overload the database with synchronous lookups. Interviewers expect an asynchronous expansion strategy.
Hints to consider:
Third-party notification providers impose per-second and per-minute limits. Interviewers want to see adaptive rate limiting that respects quotas without dropping messages.
Hints to consider:
At-least-once retry semantics can cause duplicate sends. Interviewers expect deduplication logic to prevent users from receiving the same alert twice.
Hints to consider:
Confirm the definition of "critical" versus "promotional" traffic and acceptable latency targets for each. Ask whether the system is multi-tenant and if different customers should have isolated quotas. Verify whether multi-region deployments are needed. Clarify retry policies: how many attempts, over what window, and whether retries fall back to alternative channels. Confirm the size of the largest expected campaign and whether audience expansion happens before or after submission. Establish whether delivery receipts and engagement tracking are required.
Sketch an ingestion API layer that accepts notification requests and writes them to a partitioned message log. Use separate Kafka topics for critical and promotional streams to enforce traffic isolation. Deploy consumer groups that pull from these topics and perform audience expansion, preference lookups, and provider-specific formatting. Place a rate-limiting layer in front of provider gateways that tracks quotas per provider and per tenant using a shared cache. Include a retry service that requeues failed messages with exponential backoff and respects expiration timestamps. Store notification metadata and delivery status in a horizontally scalable database for auditing and analytics.
Walk through how a promotional campaign flows through the system. The API accepts the campaign, assigns an expiration timestamp, and writes a single "campaign job" record to the promotional topic. Expansion workers pick up the job, query user segments in batches, and emit individual user-message pairs to a downstream topic. Formatting workers read from this topic, fetch user preferences from a cache-backed store, enrich the payload, and forward to the provider gateway. Critical alerts bypass the expansion step and go directly to formatting workers via the high-priority topic. Partition by user ID or device ID to maintain per-user ordering. Emphasize that critical and promotional workers scale independently and that promotional workers can be throttled during overload.
Discuss reliability through Kafka replication and consumer offset commits to prevent data loss, with dead-letter queues for messages that fail after maximum retries. Cover idempotency by generating a unique message ID and checking a Redis cache before sending. Explain rate limiting with a token bucket per provider in Redis using atomic decrement operations. Address expiration by checking timestamps at every stage and dropping messages whose window has closed. Mention observability: emit metrics for queue depth, per-priority latency, provider error rates, and expiration drops. Optionally sketch multi-region deployment with regional message logs and provider gateways.
Candidates at Metropolis report that interviewers asked them to design both simple one-to-one notification delivery and bulk campaign fan-out in the same session, paying close attention to how the two traffic types are isolated. One interviewer specifically requested API endpoint designs, database schemas, and contracts between components. Be ready to discuss how you would handle a provider outage mid-campaign without losing or duplicating messages.