Design a price tracking and notification service that lets users monitor Amazon product prices, subscribe to drop alerts, and view historical pricing trends. Users add products by URL or ASIN, set target-price or percentage-drop rules, and receive timely notifications through their preferred channels when conditions are met. The platform periodically fetches current prices, stores a time-series history, and renders charts showing price movement over configurable windows.
The system must handle tens of millions of tracked products, schedule and execute crawls at scale while respecting source rate limits, detect meaningful price changes without flooding users with noise from minor fluctuations, and fan out notifications to millions of subscribers for popular items. Interviewers use this problem to test your ability to design ingestion pipelines, event-driven notification workflows, high-fanout delivery, and time-series storage, with emphasis on scheduling, deduplication, idempotent alerting, and cost management.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Fetching prices for tens of millions of products requires a well-orchestrated crawl system that respects source rate limits, prioritizes frequently changing items, and handles failures gracefully.
Hints to consider:
Not every price observation warrants a notification. Minor fluctuations, duplicate observations, and rounding differences must be filtered out before evaluating subscriber rules.
Hints to consider:
A popular product can have millions of subscribers. When its price drops, the system must fan out notifications without overwhelming downstream providers or creating duplicate sends.
Hints to consider:
Historical price data grows continuously and must support both fast recent-window queries for user-facing charts and efficient long-term retention for trend analysis.
Hints to consider:
Crawling millions of products is expensive in compute and network terms. Interviewers want to see intelligent prioritization that maximizes freshness within a budget.
Hints to consider:
Confirm the number of tracked products, subscriber scale, and expected notification volume. Ask whether the system must support sources beyond Amazon and whether product data beyond price (reviews, ratings) is in scope. Clarify acceptable staleness for price data and latency targets for alert delivery. Determine whether digest-style notifications (daily summary email) are required alongside real-time alerts. Ask about compliance concerns such as data retention policies or user data deletion rights.
Sketch the major components: a product and subscription API backed by PostgreSQL for user accounts, watchlists, and alert rules. A crawl scheduler that publishes crawl tasks to a Kafka topic, consumed by a pool of fetcher workers that respect rate limits. A change detection service that compares fetched prices against the cached last-known price and publishes price-change events. A notification pipeline that expands subscriber lists, evaluates rules, deduplicates, and dispatches through channel-specific providers. A time-series store (DynamoDB or TimescaleDB) for price history. A Redis layer for the latest price cache, deduplication keys, and rate-limiting counters.
Walk through the path from a crawl result to a delivered alert. A fetcher worker retrieves the current price for a product, publishes the raw observation to a Kafka topic. A change detection consumer reads it, loads the last confirmed price from Redis, and compares. If the change exceeds the noise threshold, it publishes a price-change event to a separate topic. A notification expansion consumer reads this event, queries the subscriber list for the product, evaluates each subscriber's rules (target price, percentage drop), and for matching subscribers, produces individual notification messages onto channel-specific topics (email, push). A delivery worker picks up each message, checks the deduplication cache, formats the content, and calls the provider API. On success, it writes a delivery record; on failure, it retries with backoff up to a maximum before routing to the dead-letter queue.
Cover crawl resilience: fetcher workers implement per-domain rate limits, exponential backoff on failures, and circuit breakers that pause crawling a source when error rates spike. Discuss storage lifecycle: raw observations are retained for 30 days at full resolution, then downsampled to hourly aggregates, and eventually to daily aggregates after one year. Address monitoring: track crawl throughput, failure rate by source, average data staleness, notification delivery rate, and duplicate suppression counts. Mention scalability: partition Kafka topics by product_id, scale fetcher and notification workers independently based on queue depth, and shard Redis by key prefix. Briefly touch on security: encrypt stored credentials, use OAuth for any authenticated source access, and rate-limit subscription creation to prevent abuse.