Practice/Meta/Design a Price Drop Tracker like CamelCamelCamel

Design a Price Drop Tracker like CamelCamelCamel

Product DesignMust

Problem Statement

Design a price tracking service similar to CamelCamelCamel that monitors product prices across major e-commerce platforms like Amazon, Walmart, and Target. Users should be able to search for products, view comprehensive historical price charts, and configure alerts that notify them when prices drop to or below their desired thresholds. The system needs to track millions of products, scrape price data regularly without overwhelming retailer servers, store years of historical pricing information efficiently, and deliver timely notifications through multiple channels (email, SMS, push notifications).

Your solution should handle the challenge of collecting price data from external sources that may have rate limits or anti-scraping measures, storing time-series data cost-effectively, and processing alert conditions for millions of user-product combinations without significant delay when price updates occur.

Key Requirements

Functional

Product price tracking -- continuously monitor and record prices for millions of products across multiple retailers
Historical price visualization -- display price trends with charts showing data points over configurable time ranges (30 days, 90 days, 1 year, all time)
Price drop alerts -- allow users to set target prices and receive notifications when products meet or fall below those thresholds
Multi-channel notifications -- deliver alerts via email, SMS, browser push, and mobile app notifications
Product search and discovery -- enable users to find products by URL, ASIN, keyword, or barcode

Non-Functional

Scalability -- support 50M+ tracked products with 10M+ active users setting 100M+ price alerts
Reliability -- ensure 99.9% uptime for alert delivery with no missed price drops on tracked products
Latency -- detect price changes and send alerts within 15 minutes of price updates; serve historical charts in under 500ms
Consistency -- eventual consistency acceptable for historical data; strong consistency required for user alert preferences
Cost-efficiency -- minimize storage costs for years of time-series data while maintaining query performance

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Data Collection Strategy

How you gather price data from external e-commerce sites without violating rate limits or getting blocked is crucial. Interviewers want to see you balance freshness of data against scraping costs and politeness.

Hints to consider:

Implement tiered polling frequencies based on product popularity and price volatility
Use distributed task queues to spread scraping load across time and IP addresses
Consider partnering with retailers for official API access versus web scraping tradeoffs
Design exponential backoff and circuit breaker patterns for failed scrape attempts

2. Time-Series Storage Optimization

Storing billions of price data points efficiently while supporting fast queries for chart rendering is a key technical challenge. Raw storage of every data point becomes prohibitively expensive.

Hints to consider:

Apply downsampling strategies where older data uses hourly/daily aggregates instead of individual points
Use columnar storage formats optimized for time-series data (InfluxDB, TimescaleDB, or Druid)
Implement data retention policies that archive or aggregate data beyond certain age thresholds
Consider separating hot (recent) and cold (historical) storage tiers with different technologies

3. Alert Processing at Scale

Processing millions of alert conditions every time prices update requires careful design to avoid overwhelming your notification systems or missing critical alerts.

Hints to consider:

Use inverted index pattern: map products to interested users rather than checking each user's alerts
Implement batch processing windows to group notifications and reduce notification fatigue
Design dead letter queues and retry logic for failed notification deliveries
Consider rate limiting per user to prevent spam from products with rapidly fluctuating prices

4. Handling External Dependencies

Your system depends on external retailers that may change their page structure, implement anti-bot measures, or experience downtime, requiring defensive design patterns.

Hints to consider:

Build parser versioning system to handle multiple HTML/JSON structures per retailer
Implement anomaly detection to flag suspiciously high price changes (likely parsing errors)
Design graceful degradation when certain retailers become unavailable
Cache last-known prices to serve users even when fresh scrapes fail

Suggested Approach

Step 1: Clarify Requirements

Start by confirming the scope and constraints:

How many products need tracking simultaneously? (defines scraping infrastructure)
What's the acceptable delay between a price change and alert delivery?
How many retailers need support initially, and how do they differ (API vs scraping)?
What notification channels are required and their expected volume?
How far back should historical data be retained?
Are there budget constraints on storage and notification costs?
Should the system detect deal patterns (lightning deals, seasonal trends)?

Step 2: High-Level Architecture

Sketch the major system components:

Core Services:

Scraper Fleet -- distributed workers that fetch prices from retailer websites
Price Ingestion Service -- normalizes and validates incoming price data
Time-Series Database -- stores historical price points with efficient compression
Alert Matching Engine -- evaluates price updates against active user alerts
Notification Service -- dispatches alerts via multiple channels (email, SMS, push)
API Gateway -- serves client requests for historical data and alert management
Product Catalog -- maintains metadata about tracked products

Data Flow:

Scraper fleet pulls prices on scheduled intervals
Ingestion service validates and writes to time-series DB
Price updates trigger alert matching engine
Matched alerts queued to notification service
Users query historical data through API gateway