Practice/Google/Design a Trending Hashtags System
Design a Trending Hashtags System
System DesignMust
Problem Statement
Design a system that identifies the top-K trending hashtags across a social media platform in near real time. The system ingests a firehose of posts containing hashtags, aggregates their frequency over sliding time windows (for example, the last 1 hour and the last 24 hours), and surfaces the most popular ones on a trending page.
The core challenge is performing streaming aggregation at massive scale — potentially hundreds of thousands of posts per second — while maintaining accurate or near-accurate counts over sliding time windows. Unlike batch processing, you need results updated continuously, not computed once an hour.
Hot hashtags present a write contention problem: during major events, a single hashtag can appear in tens of thousands of posts per second. Your design must handle this skew without creating bottlenecks.
Key Requirements
Functional
- Hashtag extraction and counting -- Parse hashtags from incoming posts and maintain frequency counts over configurable time windows.
- Top-K retrieval -- Return the K most popular hashtags for a given time window (last 1 hour, last 24 hours) with their approximate counts.
- Geo and category filtering -- Support filtering trending hashtags by geographic region or content category.
- Historical trends -- Allow querying what was trending at a specific point in the past.
Non-Functional
- Scalability -- Handle 500,000+ posts per second at peak, each potentially containing multiple hashtags.
- Freshness -- Trending results reflect activity from the last few minutes; updates visible within 30 seconds.
- Accuracy -- Approximate counts are acceptable (within 5% of true count) if it reduces infrastructure cost.
- Availability -- The trending page must remain available even if parts of the pipeline lag behind.
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Streaming Aggregation and Sliding Windows
Maintaining accurate counts over a continuously sliding time window is fundamentally different from fixed batch intervals.
Hints to consider:
- Consider tumbling windows (non-overlapping fixed intervals) as building blocks — aggregate counts per 1-minute tumbling window, then combine the last 60 windows for the hourly view
- How does Apache Flink's windowing model help you implement sliding window semantics efficiently?
- What happens when events arrive out of order or late — how do you handle watermarks?
- Think about the memory required to track per-hashtag counts across many small windows
2. Hot Hashtag Contention
A viral hashtag during a global event can dominate the write load.
Hints to consider:
- Pre-aggregate counts locally on each Flink task before merging globally, reducing the fan-in
- Consider the Count-Min Sketch data structure for approximate frequency estimation with bounded memory
- How do you shard the aggregation by hashtag hash and then merge top-K results across shards?
- What is the tradeoff between exact counts and approximate algorithms like Space-Saving or Lossy Counting?
3. Top-K Computation
Maintaining a global top-K across distributed aggregators requires careful coordination.
Hints to consider:
- Each aggregation shard maintains a local min-heap of its top-K hashtags; a merger combines them
- Why must each shard track more than K candidates to avoid missing globally trending hashtags?
- How frequently do you recompute the global top-K — on every update or periodically?
- Consider storing the merged top-K result in Redis for fast retrieval by the API layer