LinkedIn's "Design Top K Search Words" interview question focuses on building a scalable system to track and rank the most frequent search queries in real time. It draws from heavy hitters algorithms and stream processing to handle high-volume web data.
Design a service that continuously processes user search queries from LinkedIn's platform and returns the top K most searched words or phrases over a recent time window (e.g., last hour or day). The system must support high throughput (millions of queries per second), low-latency queries (<100ms), and real-time updates while using bounded memory. It aligns with tags like ranking (via frequency/count scores), data engineering (aggregation pipelines), web/backend (API serving), machine learning (potential advanced ranking), and stream processing/system design (distributed heavy hitters).[1][2][6][7]
No verbatim examples from official LinkedIn sources, but standard formulations use these:[1][2]
Stream Input (continuous events):
search: "machine learning", timestamp: 2026-02-02T04:00:00Z, user_id: 123 search: "system design", timestamp: 2026-02-02T04:00:01Z, user_id: 456 search: "machine learning", timestamp: 2026-02-02T04:00:02Z, user_id: 789 ... (millions more)
API Output (JSON):
json { "top_k": [ {"query": "machine learning", "count": 150000, "rank": 1}, {"query": "system design", "count": 120000, "rank": 2}, {"query": "data engineering", "count": 80000, "rank": 3} ], "window": "1h", "as_of": "2026-02-02T04:23:00Z" }
For K=3 over 1-hour window.[2]