Practice/LinkedIn/Design a Top-K URL System for LinkedIn Posts

Design a Top-K URL System for LinkedIn Posts

System DesignMust

Problem Statement

Design a system that tracks and displays the top K most shared URLs/articles from LinkedIn posts across different time windows (5 minutes, 1 hour, 1 day). The system should handle high scale (10M+ articles/day) with rolling time windows and eventual consistency.

Users see which articles are trending right now and can explore them without worrying about duplicate links with different tracking parameters or shorteners. The system must handle skew from viral links, event-time vs processing-time correctness, late/out-of-order data, and the trade-offs between exactness, freshness, and cost.

Key Requirements

Functional

Rolling window leaderboards -- users view the top K most shared URLs for rolling windows of 5 minutes, 1 hour, and 1 day
URL deduplication -- rankings consolidate the same article even when links differ by tracking parameters, redirects, or shorteners
Near-real-time freshness -- leaderboard updates reflect sharing activity within approximately one minute
Scoped rankings -- support global and optionally scoped leaderboards (by region or language) with the same freshness guarantees

Non-Functional

Scalability -- handle 10M+ unique articles per day with sharing event throughput in the hundreds of thousands per second
Reliability -- tolerate component failures without losing share events or corrupting rankings
Latency -- serve leaderboard queries with sub-100ms p99 latency from cached/pre-computed results
Consistency -- eventual consistency up to one minute is acceptable; rankings should converge correctly

Interview Reports from Hello Interview

10 reports from candidates. Most recently asked at LinkedIn in Early January 2026.

Also commonly asked at: Amazon.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Streaming Aggregation with Rolling Windows

Interviewers want to see how you maintain accurate counts across overlapping time windows with event-time semantics.

Hints to consider:

Use Flink with event-time processing, watermarks, and sliding/tumbling windows to maintain per-bucket counts
Implement bounded-lateness handling to accept late events within a configurable grace period
Use hierarchical aggregation: compute 5-minute windows first, then combine for 1-hour and 1-day windows
Emit top-K snapshots at regular intervals rather than recomputing on every event

2. URL Canonicalization and Deduplication

The same article can appear with different tracking parameters, URL shorteners, or redirect chains. Interviewers expect a strategy to consolidate these.

Hints to consider:

Strip tracking parameters (utm_source, utm_medium, etc.) during URL normalization at ingestion time
Resolve URL shorteners and redirects asynchronously, caching the mapping from short URL to canonical URL
Use a URL fingerprint (hash of normalized URL) as the aggregation key rather than the raw URL string
Handle edge cases like paywall URLs, AMP links, and mobile-specific URLs that point to the same content

3. Hot Key Handling for Viral Content

Viral URLs receive disproportionate share counts that can overwhelm a single counter or partition.

Hints to consider:

Shard counters for hot URLs across multiple sub-keys and merge periodically
Use consistent hashing to distribute share events and micro-batch updates to avoid lock contention
Detect hot keys dynamically using per-partition metrics and increase shard factor automatically
Separate the "top 10" that changes infrequently from the long tail that churns constantly

4. Pre-Computed Serving Layer

Computing rankings on read by scanning raw events is not viable at this scale. Interviewers look for a materialized serving layer.

Hints to consider:

Store pre-computed top-K sorted sets in Redis per window/scope combination, updated every few seconds by the streaming pipeline
Use versioned keys and atomic swaps to ensure readers always see a consistent snapshot
Cache leaderboard responses at the CDN and application level with short TTLs (30-60 seconds)
Serve deeper rankings (beyond top-100) from a secondary store with relaxed latency requirements

Practice/LinkedIn/Design a Top-K URL System for LinkedIn Posts

Design a Top-K URL System for LinkedIn Posts

System DesignMust

Problem Statement

Key Requirements

Functional

Rolling window leaderboards -- users view the top K most shared URLs for rolling windows of 5 minutes, 1 hour, and 1 day
URL deduplication -- rankings consolidate the same article even when links differ by tracking parameters, redirects, or shorteners
Near-real-time freshness -- leaderboard updates reflect sharing activity within approximately one minute
Scoped rankings -- support global and optionally scoped leaderboards (by region or language) with the same freshness guarantees

Non-Functional

Scalability -- handle 10M+ unique articles per day with sharing event throughput in the hundreds of thousands per second
Reliability -- tolerate component failures without losing share events or corrupting rankings
Latency -- serve leaderboard queries with sub-100ms p99 latency from cached/pre-computed results
Consistency -- eventual consistency up to one minute is acceptable; rankings should converge correctly

Interview Reports from Hello Interview

10 reports from candidates. Most recently asked at LinkedIn in Early January 2026.

Also commonly asked at: Amazon.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Streaming Aggregation with Rolling Windows

Interviewers want to see how you maintain accurate counts across overlapping time windows with event-time semantics.

Hints to consider:

Use Flink with event-time processing, watermarks, and sliding/tumbling windows to maintain per-bucket counts
Implement bounded-lateness handling to accept late events within a configurable grace period
Use hierarchical aggregation: compute 5-minute windows first, then combine for 1-hour and 1-day windows
Emit top-K snapshots at regular intervals rather than recomputing on every event

2. URL Canonicalization and Deduplication

The same article can appear with different tracking parameters, URL shorteners, or redirect chains. Interviewers expect a strategy to consolidate these.

Hints to consider:

Strip tracking parameters (utm_source, utm_medium, etc.) during URL normalization at ingestion time
Resolve URL shorteners and redirects asynchronously, caching the mapping from short URL to canonical URL
Use a URL fingerprint (hash of normalized URL) as the aggregation key rather than the raw URL string
Handle edge cases like paywall URLs, AMP links, and mobile-specific URLs that point to the same content

3. Hot Key Handling for Viral Content

Viral URLs receive disproportionate share counts that can overwhelm a single counter or partition.

Hints to consider:

Shard counters for hot URLs across multiple sub-keys and merge periodically
Use consistent hashing to distribute share events and micro-batch updates to avoid lock contention
Detect hot keys dynamically using per-partition metrics and increase shard factor automatically
Separate the "top 10" that changes infrequently from the long tail that churns constantly

4. Pre-Computed Serving Layer

Computing rankings on read by scanning raw events is not viable at this scale. Interviewers look for a materialized serving layer.

Hints to consider:

Store pre-computed top-K sorted sets in Redis per window/scope combination, updated every few seconds by the streaming pipeline
Use versioned keys and atomic swaps to ensure readers always see a consistent snapshot
Cache leaderboard responses at the CDN and application level with short TTLs (30-60 seconds)
Serve deeper rankings (beyond top-100) from a secondary store with relaxed latency requirements