Practice/LinkedIn/Design a Trending Hashtags System
Design a Trending Hashtags System
System DesignMust
Problem Statement
Design a highly scalable system that tracks and computes the top K trending hashtags (e.g., top 30-50) across different time windows (last 5/15/30/60 minutes or custom intervals) for billions of users, with support for filtering by geography (local/global) and categories (food, sports, politics).
The system must exercise your ability to design high-throughput streaming systems, compute top-K over sliding windows, and handle skewed traffic (hot hashtags) at global scale. Balance accuracy vs freshness vs cost while supporting filters (geo/category) and user-specified windows.
Key Requirements
Functional
- Multi-window trending -- users view the top K (e.g., 30-50) hashtags for a chosen time window (last 1/5/15/60 minutes or up to 12 hours)
- Geo and category filters -- users filter trending hashtags by geography (global, country, city) and by category (food, sports, politics)
- Custom time intervals -- users specify a custom time interval within allowed bounds and receive a ranked list quickly
- Near-real-time freshness -- trends reflect changes within a target latency of a few seconds
Non-Functional
- Scalability -- handle billions of users and 100M+ daily posts creating a write-heavy stream of hashtag events
- Reliability -- tolerate datacenter failures and stream processing restarts without losing trending accuracy
- Latency -- serve trending queries with sub-100ms latency from pre-computed results
- Consistency -- eventual consistency with bounded freshness; rankings converge within seconds of actual trend changes
Interview Reports from Hello Interview
36 reports from candidates. Most recently asked at LinkedIn in Early October 2025.
Also commonly asked at: Meta, Google, Atlassian.
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. High-Volume Event Ingestion and Partitioning
Billions of users creating posts with hashtags generate an enormous write-heavy stream. Interviewers want to see how you partition and pre-aggregate without creating bottlenecks.
Hints to consider:
- Use Kafka partitioned by hashtag (or hash of hashtag + geo) for ordered, parallel processing
- Implement local aggregation at the ingestion layer: count hashtags per time bucket before publishing downstream
- Apply batching and compression to reduce network overhead from high-volume producers
- Use per-partition pre-aggregation in stream processors to avoid global coordination
2. Hot Hashtag Handling
Trending topics by definition create extreme skew where a few hashtags receive the majority of updates. Interviewers probe how you avoid hot shards and single-point contention.
Hints to consider:
- Use virtual sharding: split hot hashtag counters across N sub-keys and merge periodically
- Apply local aggregation with tumbling micro-windows (e.g., 5 seconds) before updating global state
- Use probabilistic data structures (Count-Min Sketch) for approximate counting of the long tail while maintaining exact counts for top-K candidates
- Detect hot hashtags dynamically and apply increased parallelism for their processing
3. Sliding Window Computation
Maintaining accurate counts across sliding windows of different durations is algorithmically challenging. Interviewers expect a practical design.
Hints to consider:
- Use tumbling sub-windows (e.g., 1-minute buckets) and combine them to approximate sliding windows
- For "last 60 minutes," maintain 60 one-minute buckets and sum the most recent ones, evicting expired buckets
- Implement hierarchical aggregation: 1-minute buckets roll up to 5-minute, which roll up to hourly
- Handle late data with allowed lateness in stream processing; accept that very late events may not affect already-served trends
4. Pre-Computed Serving Layer
Computing trends on read by scanning raw logs will not meet real-time SLAs. Interviewers expect pre-computation and caching.
Hints to consider:
- Materialize top-K results per (window, geo, category) combination in Redis sorted sets, updated every few seconds
- Cache trending results at multiple layers (application cache, CDN) with TTLs matching the update frequency
- Serve common queries (global top-50 last hour) from pre-warmed caches that never expire during normal operation
- Support less common queries (specific city, custom time range) with on-demand aggregation from pre-computed sub-window data