Practice/Amazon/Design a Notification System for Reminders
Design a Notification System for Reminders
System DesignMust
Problem Statement
Design a system that allows users to set time-sensitive reminders with recurring intervals (hourly, daily, monthly) and reliably delivers notifications through channels like email or push notifications. The system must handle reminders that fire far in the future, respect time zones, and scale to millions of active reminders.
Interviewers ask this because it blends a distributed job scheduler with a multi-channel notification pipeline. It tests whether you can design durable timers, handle recurrence rules and time zones, smooth traffic spikes at common times, guarantee reliability with retries and idempotency, and operate at scale.
Key Requirements
Functional
- Reminder creation -- users create, update, and cancel one-time and recurring reminders (hourly, daily, monthly) in a chosen time zone
- Channel selection -- users choose one or more delivery channels (email, push) and customize the reminder message
- Delivery history -- users view upcoming reminders and past deliveries with status (sent, failed, retried)
- Reliable delivery -- users receive reminders reliably at the scheduled time with minimal delay, even during spikes or partial outages
Non-Functional
- Scalability -- support millions of active reminders with common firing times (9:00 AM, top of the hour) creating concentrated load spikes
- Reliability -- guarantee at-least-once delivery with idempotent processing; no missed reminders after system outages
- Latency -- reminders delivered within 30 seconds of scheduled time under normal conditions; catch-up within minutes after outages
- Consistency -- strong consistency for reminder state (creation, cancellation); eventual consistency for delivery status and history
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Distributed Scheduling Architecture
Reminders fire far in the future and must reliably trigger even after system restarts, deployments, and outages.
Hints to consider:
- Store reminder definitions with next-fire timestamps in a durable database, using indexes on next-fire time for efficient range queries
- Use a two-tier scheduling approach: a scanner finds reminders due in the next window and enqueues them to Kafka, workers process each independently
- Implement catch-up logic: after an outage, scan for all past-due reminders and process them with appropriate handling (deliver late vs skip)
- Avoid loading all reminders into memory; instead, use time-bucketed database queries that partition the scanning workload
2. Traffic Spike Smoothing
Many reminders cluster at common times (9:00 AM, top of the hour, first of the month), creating thundering herds.
Hints to consider:
- Jitter delivery within a configurable window (e.g., a reminder set for 9:00 AM fires between 8:59:30 and 9:00:30) to smooth spikes
- Shard the scanner by time buckets so multiple workers scan different slices of the due-reminder space in parallel
- Use rate limiting at the notification channel level to protect email and push providers from sudden load
- Pre-warm notification service connections ahead of known spike windows based on historical patterns
3. Recurrence and Time Zone Handling
Recurring reminders must compute the next occurrence correctly across time zones, DST transitions, and calendar edge cases.
Hints to consider:
- Store recurrence rules with the user's original timezone so DST transitions shift the UTC firing time correctly
- Compute the next-fire timestamp after each delivery, writing it back to the reminder record atomically
- Handle edge cases like "monthly on the 31st" by using last-day-of-month semantics for shorter months
- Support "skip next" and "pause" operations by updating the next-fire timestamp without deleting the recurrence rule
4. Multi-Channel Delivery with Retries
Each reminder may be delivered through multiple channels, each with different failure modes and retry characteristics.
Hints to consider:
- Fan out to each configured channel independently so a failure in email delivery does not block push notification delivery
- Use idempotency keys per (reminder_id, occurrence_number, channel) to prevent duplicate deliveries during retries
- Implement exponential backoff with configurable retry limits per channel, routing exhausted retries to a dead-letter queue
- Track delivery status per channel per occurrence, surfacing failures in the user's delivery history for transparency
Suggested Approach
Step 1: Clarify Requirements
Confirm scope with the interviewer. Ask about the number of active reminders, supported recurrence patterns, delivery channels, and acceptable delivery latency. Clarify whether rich content (images, action buttons) is required in notifications or just text. Establish timezone handling expectations: does the user set a timezone per reminder or use a global preference? Confirm whether the system should detect and handle conflicting reminders or just deliver independently.