Practice/LinkedIn/Design a Trending Hashtags System

Design a Trending Hashtags System

System DesignMust

Problem Statement

Design a highly scalable system that tracks and computes the top K trending hashtags (e.g., top 30-50) across different time windows (last 5/15/30/60 minutes or custom intervals) for billions of users, with support for filtering by geography (local/global) and categories (food, sports, politics).

The system must exercise your ability to design high-throughput streaming systems, compute top-K over sliding windows, and handle skewed traffic (hot hashtags) at global scale. Balance accuracy vs freshness vs cost while supporting filters (geo/category) and user-specified windows.

Key Requirements

Functional

Multi-window trending -- users view the top K (e.g., 30-50) hashtags for a chosen time window (last 1/5/15/60 minutes or up to 12 hours)
Geo and category filters -- users filter trending hashtags by geography (global, country, city) and by category (food, sports, politics)
Custom time intervals -- users specify a custom time interval within allowed bounds and receive a ranked list quickly
Near-real-time freshness -- trends reflect changes within a target latency of a few seconds

Non-Functional

Scalability -- handle billions of users and 100M+ daily posts creating a write-heavy stream of hashtag events
Reliability -- tolerate datacenter failures and stream processing restarts without losing trending accuracy
Latency -- serve trending queries with sub-100ms latency from pre-computed results
Consistency -- eventual consistency with bounded freshness; rankings converge within seconds of actual trend changes

Interview Reports from Hello Interview

36 reports from candidates. Most recently asked at LinkedIn in Early October 2025.

Also commonly asked at: Meta, Google, Atlassian.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. High-Volume Event Ingestion and Partitioning

Billions of users creating posts with hashtags generate an enormous write-heavy stream. Interviewers want to see how you partition and pre-aggregate without creating bottlenecks.

Hints to consider:

Use Kafka partitioned by hashtag (or hash of hashtag + geo) for ordered, parallel processing
Implement local aggregation at the ingestion layer: count hashtags per time bucket before publishing downstream
Apply batching and compression to reduce network overhead from high-volume producers
Use per-partition pre-aggregation in stream processors to avoid global coordination

2. Hot Hashtag Handling

Trending topics by definition create extreme skew where a few hashtags receive the majority of updates. Interviewers probe how you avoid hot shards and single-point contention.

Hints to consider:

Use virtual sharding: split hot hashtag counters across N sub-keys and merge periodically
Apply local aggregation with tumbling micro-windows (e.g., 5 seconds) before updating global state
Use probabilistic data structures (Count-Min Sketch) for approximate counting of the long tail while maintaining exact counts for top-K candidates
Detect hot hashtags dynamically and apply increased parallelism for their processing

3. Sliding Window Computation

Maintaining accurate counts across sliding windows of different durations is algorithmically challenging. Interviewers expect a practical design.

Hints to consider:

Use tumbling sub-windows (e.g., 1-minute buckets) and combine them to approximate sliding windows
For "last 60 minutes," maintain 60 one-minute buckets and sum the most recent ones, evicting expired buckets
Implement hierarchical aggregation: 1-minute buckets roll up to 5-minute, which roll up to hourly
Handle late data with allowed lateness in stream processing; accept that very late events may not affect already-served trends

4. Pre-Computed Serving Layer

Computing trends on read by scanning raw logs will not meet real-time SLAs. Interviewers expect pre-computation and caching.

Hints to consider:

Materialize top-K results per (window, geo, category) combination in Redis sorted sets, updated every few seconds
Cache trending results at multiple layers (application cache, CDN) with TTLs matching the update frequency
Serve common queries (global top-50 last hour) from pre-warmed caches that never expire during normal operation
Support less common queries (specific city, custom time range) with on-demand aggregation from pre-computed sub-window data

Practice/LinkedIn/Design a Trending Hashtags System

Design a Trending Hashtags System

System DesignMust

Problem Statement

Key Requirements

Functional

Multi-window trending -- users view the top K (e.g., 30-50) hashtags for a chosen time window (last 1/5/15/60 minutes or up to 12 hours)
Geo and category filters -- users filter trending hashtags by geography (global, country, city) and by category (food, sports, politics)
Custom time intervals -- users specify a custom time interval within allowed bounds and receive a ranked list quickly
Near-real-time freshness -- trends reflect changes within a target latency of a few seconds

Non-Functional

Scalability -- handle billions of users and 100M+ daily posts creating a write-heavy stream of hashtag events
Reliability -- tolerate datacenter failures and stream processing restarts without losing trending accuracy
Latency -- serve trending queries with sub-100ms latency from pre-computed results
Consistency -- eventual consistency with bounded freshness; rankings converge within seconds of actual trend changes

Interview Reports from Hello Interview

36 reports from candidates. Most recently asked at LinkedIn in Early October 2025.

Also commonly asked at: Meta, Google, Atlassian.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. High-Volume Event Ingestion and Partitioning

Billions of users creating posts with hashtags generate an enormous write-heavy stream. Interviewers want to see how you partition and pre-aggregate without creating bottlenecks.

Hints to consider:

Use Kafka partitioned by hashtag (or hash of hashtag + geo) for ordered, parallel processing
Implement local aggregation at the ingestion layer: count hashtags per time bucket before publishing downstream
Apply batching and compression to reduce network overhead from high-volume producers
Use per-partition pre-aggregation in stream processors to avoid global coordination

2. Hot Hashtag Handling

Trending topics by definition create extreme skew where a few hashtags receive the majority of updates. Interviewers probe how you avoid hot shards and single-point contention.

Hints to consider:

Use virtual sharding: split hot hashtag counters across N sub-keys and merge periodically
Apply local aggregation with tumbling micro-windows (e.g., 5 seconds) before updating global state
Use probabilistic data structures (Count-Min Sketch) for approximate counting of the long tail while maintaining exact counts for top-K candidates
Detect hot hashtags dynamically and apply increased parallelism for their processing

3. Sliding Window Computation

Maintaining accurate counts across sliding windows of different durations is algorithmically challenging. Interviewers expect a practical design.

Hints to consider:

Use tumbling sub-windows (e.g., 1-minute buckets) and combine them to approximate sliding windows
For "last 60 minutes," maintain 60 one-minute buckets and sum the most recent ones, evicting expired buckets
Implement hierarchical aggregation: 1-minute buckets roll up to 5-minute, which roll up to hourly
Handle late data with allowed lateness in stream processing; accept that very late events may not affect already-served trends

4. Pre-Computed Serving Layer

Computing trends on read by scanning raw logs will not meet real-time SLAs. Interviewers expect pre-computation and caching.

Hints to consider:

Materialize top-K results per (window, geo, category) combination in Redis sorted sets, updated every few seconds
Cache trending results at multiple layers (application cache, CDN) with TTLs matching the update frequency
Serve common queries (global top-50 last hour) from pre-warmed caches that never expire during normal operation
Support less common queries (specific city, custom time range) with on-demand aggregation from pre-computed sub-window data