Practice/Datadog/Design a Discounted Flight Alert System
Design a Discounted Flight Alert System
System DesignMust
Problem Statement
Design a flight price monitoring system that allows users to subscribe to specific routes (origin, destination, and travel dates) and receive email notifications within 10 minutes when flight prices drop significantly below historical averages. The system must integrate with external flight pricing APIs, handle API rate limits and failures gracefully, deduplicate notifications so users are not spammed, and scale to support millions of route subscriptions across hundreds of thousands of users.
Discounted flight alert systems like Google Flights alerts or Hopper watch airfare across routes and notify subscribers when deals appear. Interviewers ask this to test your ability to integrate with unreliable external APIs, run reliable background polling workflows, meet a strict end-to-end notification SLA, and scale fanout to many subscribers per route. They want to see event-driven pipeline thinking, idempotency and deduplication strategies, rate limiting, and graceful degradation under real-world constraints.
Key Requirements
Functional
- Route subscription -- users subscribe to specific routes with origin, destination, date ranges, and optional price thresholds, and manage their notification preferences
- Timely price drop alerts -- users receive an email notification within 10 minutes when a subscribed route experiences a significant price drop relative to recent history
- Subscription management -- users can view, pause, update thresholds, or unsubscribe from existing route subscriptions at any time
- Notification deduplication -- users do not receive duplicate alerts for the same price drop event on a given subscription
Non-Functional
- Scalability -- handle 10 million active route subscriptions across 1 million users, with 100K price checks per minute during peak polling cycles
- Reliability -- achieve 99.9% uptime with no lost alerts, maintaining durable processing even during external API outages or partial system failures
- Latency -- detect price drops and deliver email notifications within 10 minutes end-to-end under normal operating conditions
- Consistency -- guarantee at-most-once notification delivery per price drop event per subscription through idempotency controls
What Interviewers Focus On
Based on real interview experiences at Datadog, these are the areas interviewers probe most deeply:
1. External Flight API Integration and Crawling Strategy
Your system depends on external pricing APIs (like Skyscanner or similar providers) that have rate limits, variable latency, and periodic outages. How you design the crawling layer determines whether you meet the 10-minute SLA.
Hints to consider:
- Implement adaptive polling frequencies: popular routes with many subscribers get checked more frequently than low-demand routes
- Use per-provider worker pools with independent circuit breakers so a single provider outage does not stall the entire polling pipeline
- Apply exponential backoff with jitter when hitting rate limits to avoid thundering herd effects on recovering APIs
- Cache recent prices with TTLs to reduce redundant API calls while maintaining freshness guarantees needed for the SLA
2. Price Drop Detection Logic
The system needs a reliable mechanism to determine whether a current price represents a meaningful discount worth alerting users about, using historical pricing context.
Hints to consider:
- Maintain a rolling window of recent prices per route (e.g., last 10 days) in a time-series or key-value store for efficient comparison
- Design the price drop detection as a separate service that accepts current price and historical data, returning a boolean or confidence score
- Consider both absolute and percentage-based thresholds, and allow users to configure sensitivity preferences
- Handle edge cases like seasonal price swings, flash sales that revert quickly, and stale historical data from API outages
3. Notification Deduplication and Idempotency
Under at-least-once delivery semantics, retries and reprocessing can easily generate duplicate emails. Interviewers particularly care about your deduplication strategy because it directly affects user trust.
Hints to consider:
- Generate deterministic idempotency keys combining subscription ID, route, price threshold, and a time-bucketed timestamp
- Use atomic check-and-set operations (Redis SETNX with TTL) to ensure only one worker can claim a notification slot for each unique price drop event
- Track the last notified price and timestamp per subscription to suppress rapid-fire alerts during price fluctuations
- Implement a cooldown period between alerts for the same subscription so users are not overwhelmed during volatile pricing windows
4. Meeting the 10-Minute End-to-End SLA
The 10-minute requirement spans multiple asynchronous stages: polling, detection, matching subscriptions, and email delivery. Each stage must be designed to avoid introducing unnecessary delays.
Hints to consider:
- Use Kafka or a similar streaming platform to decouple polling from detection and notification, enabling parallel processing at each stage
- Partition event streams by route to enable parallel consumption while maintaining ordering guarantees within a single route
- Pre-compute and cache the mapping from route to subscriber list so subscription matching does not require expensive database joins during fanout
- Prioritize alert notifications in separate queues from bulk operations like subscription management to prevent head-of-line blocking
5. Fanout Scalability for Popular Routes
When a popular route (like NYC to LAX during holiday season) triggers a price drop, the system must notify thousands of subscribers without overwhelming email providers or downstream services.
Hints to consider:
- Use two-phase fanout: first identify all affected subscriptions for a route, then batch notification jobs to avoid thundering herds on the email service
- Implement rate limiting at the email provider level using token bucket algorithms to stay within provider sending limits
- Store subscription indexes in a denormalized format (route to subscriber list) for fast lookup during fanout
- Consider staggering delivery over a short window (still within SLA) for extremely popular routes to smooth out downstream load