Practice/Meta/Design a Price Drop Tracker like CamelCamelCamel
Design a Price Drop Tracker like CamelCamelCamel
Product DesignMust
Problem Statement
Design a price tracking service similar to CamelCamelCamel that monitors product prices across major e-commerce platforms like Amazon, Walmart, and Target. Users should be able to search for products, view comprehensive historical price charts, and configure alerts that notify them when prices drop to or below their desired thresholds. The system needs to track millions of products, scrape price data regularly without overwhelming retailer servers, store years of historical pricing information efficiently, and deliver timely notifications through multiple channels (email, SMS, push notifications).
Your solution should handle the challenge of collecting price data from external sources that may have rate limits or anti-scraping measures, storing time-series data cost-effectively, and processing alert conditions for millions of user-product combinations without significant delay when price updates occur.
Key Requirements
Functional
- Product price tracking -- continuously monitor and record prices for millions of products across multiple retailers
- Historical price visualization -- display price trends with charts showing data points over configurable time ranges (30 days, 90 days, 1 year, all time)
- Price drop alerts -- allow users to set target prices and receive notifications when products meet or fall below those thresholds
- Multi-channel notifications -- deliver alerts via email, SMS, browser push, and mobile app notifications
- Product search and discovery -- enable users to find products by URL, ASIN, keyword, or barcode
Non-Functional
- Scalability -- support 50M+ tracked products with 10M+ active users setting 100M+ price alerts
- Reliability -- ensure 99.9% uptime for alert delivery with no missed price drops on tracked products
- Latency -- detect price changes and send alerts within 15 minutes of price updates; serve historical charts in under 500ms
- Consistency -- eventual consistency acceptable for historical data; strong consistency required for user alert preferences
- Cost-efficiency -- minimize storage costs for years of time-series data while maintaining query performance
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Data Collection Strategy
How you gather price data from external e-commerce sites without violating rate limits or getting blocked is crucial. Interviewers want to see you balance freshness of data against scraping costs and politeness.
Hints to consider:
- Implement tiered polling frequencies based on product popularity and price volatility
- Use distributed task queues to spread scraping load across time and IP addresses
- Consider partnering with retailers for official API access versus web scraping tradeoffs
- Design exponential backoff and circuit breaker patterns for failed scrape attempts
2. Time-Series Storage Optimization
Storing billions of price data points efficiently while supporting fast queries for chart rendering is a key technical challenge. Raw storage of every data point becomes prohibitively expensive.
Hints to consider:
- Apply downsampling strategies where older data uses hourly/daily aggregates instead of individual points
- Use columnar storage formats optimized for time-series data (InfluxDB, TimescaleDB, or Druid)
- Implement data retention policies that archive or aggregate data beyond certain age thresholds
- Consider separating hot (recent) and cold (historical) storage tiers with different technologies
3. Alert Processing at Scale
Processing millions of alert conditions every time prices update requires careful design to avoid overwhelming your notification systems or missing critical alerts.
Hints to consider:
- Use inverted index pattern: map products to interested users rather than checking each user's alerts
- Implement batch processing windows to group notifications and reduce notification fatigue
- Design dead letter queues and retry logic for failed notification deliveries
- Consider rate limiting per user to prevent spam from products with rapidly fluctuating prices
4. Handling External Dependencies
Your system depends on external retailers that may change their page structure, implement anti-bot measures, or experience downtime, requiring defensive design patterns.
Hints to consider:
- Build parser versioning system to handle multiple HTML/JSON structures per retailer
- Implement anomaly detection to flag suspiciously high price changes (likely parsing errors)
- Design graceful degradation when certain retailers become unavailable
- Cache last-known prices to serve users even when fresh scrapes fail
Suggested Approach
Step 1: Clarify Requirements
Start by confirming the scope and constraints:
- How many products need tracking simultaneously? (defines scraping infrastructure)
- What's the acceptable delay between a price change and alert delivery?
- How many retailers need support initially, and how do they differ (API vs scraping)?
- What notification channels are required and their expected volume?
- How far back should historical data be retained?
- Are there budget constraints on storage and notification costs?
- Should the system detect deal patterns (lightning deals, seasonal trends)?
Step 2: High-Level Architecture
Sketch the major system components:
Core Services:
- Scraper Fleet -- distributed workers that fetch prices from retailer websites
- Price Ingestion Service -- normalizes and validates incoming price data
- Time-Series Database -- stores historical price points with efficient compression
- Alert Matching Engine -- evaluates price updates against active user alerts
- Notification Service -- dispatches alerts via multiple channels (email, SMS, push)
- API Gateway -- serves client requests for historical data and alert management
- Product Catalog -- maintains metadata about tracked products
Data Flow:
- Scraper fleet pulls prices on scheduled intervals
- Ingestion service validates and writes to time-series DB
- Price updates trigger alert matching engine
- Matched alerts queued to notification service
- Users query historical data through API gateway