Practice/Meta/Design Typeahead Search

Design Typeahead Search

System DesignMust

Problem Statement

Build a system that provides instant search suggestions as users type characters into a search box. Think of how major platforms like Amazon, Netflix, or Twitter offer completion hints before you finish typing -- this system should deliver the same experience at global scale.

Your design must handle hundreds of millions of daily active users typing queries across thousands of geographic locations. Each keystroke triggers a new request, creating massive read volume with strict latency requirements. The system should return the top 10 most popular matches for any prefix within 100 milliseconds at the 99th percentile. Focus on delivering consistent, non-personalized suggestions based purely on query popularity, with graceful handling of network issues and missing matches.

Key Requirements

Functional

Prefix-based suggestion retrieval -- return top-K ranked suggestions for any character sequence users type, updating with each keystroke
Popularity-based ranking -- order suggestions by aggregate frequency across all users, not personalized to individual behavior
Typo tolerance -- handle minor misspellings for longer queries (4+ characters) to improve suggestion quality
Multi-language support -- correctly tokenize and match queries in different languages and character sets
Suggestion selection -- allow users to click a suggestion to execute a full search or navigate to a result page

Non-Functional

Scalability -- support 500 million daily active users generating 50 billion keystroke requests per day
Reliability -- maintain 99.95% availability with automatic failover across regions
Latency -- deliver suggestions in under 100ms at p99, with p50 under 30ms
Consistency -- eventual consistency is acceptable; suggestions may lag popularity changes by minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Data Structure and Indexing Strategy

The core challenge is choosing how to store and retrieve prefix matches efficiently. Your indexing approach determines query latency, storage cost, and how quickly you can incorporate new popular terms.

Hints to consider:

Trie structures enable O(k) prefix lookups where k is query length, but memory usage grows exponentially with vocabulary size
Inverted indexes with edge n-grams balance space efficiency with query speed for moderate-length prefixes
Pre-computing top-K results for common short prefixes (1-3 characters) eliminates query-time computation entirely
Consider how you'll handle the long tail of rare queries versus hot prefixes like "a", "the", or trending topics

2. Caching Architecture and Hot Key Management

Short prefixes like single letters generate massive duplicate traffic. Without intelligent caching, the same query hits your data store thousands of times per second, causing both cost and latency problems.

Hints to consider:

Multi-tier caching with L1 (CDN edge), L2 (regional Redis), and L3 (distributed cache near data stores) reduces backend load exponentially
Request coalescing or collapsing prevents thundering herds when cache entries expire under heavy load
Static pre-computed responses for ultra-hot prefixes eliminate all backend queries for your highest-volume traffic
TTL strategy must balance freshness with hit rate -- longer TTLs for stable prefixes, shorter for trending terms

3. Write Path and Popularity Computation

Suggestions must reflect current popularity, but updating the index on every search would be prohibitively expensive. You need an efficient pipeline to aggregate query volume and refresh rankings.

Hints to consider:

Stream processing with windowed aggregation computes rolling popularity metrics without full re-indexing
Separate fast-changing trending terms from stable long-term popular queries using different update frequencies
Lambda architecture with batch baseline and real-time delta allows fresh suggestions without rebuilding the entire index
Consider using sampling or probabilistic counting for high-volume aggregation to reduce write amplification

4. Geographic Distribution and Tail Latency Control

With users worldwide and sub-100ms requirements, network distance dominates your latency budget. Regional deployment and smart request routing are essential, not optional.

Hints to consider:

Deploy read replicas in 10-15 geographic regions to keep users within 50ms of their nearest data center
DNS-based routing or anycast directs users to the closest healthy region automatically
Payload size matters -- returning only necessary fields and using compression can save 20-30ms on slow mobile connections
Circuit breakers and aggressive timeouts (50-75ms) with cached fallbacks prevent stragglers from degrading user experience

Suggested Approach

Step 1: Clarify Requirements

Start by confirming scope and constraints with your interviewer. Ask about daily active users, queries per user, expected growth rate, and whether suggestions should be global or regional. Clarify if the system handles only web traffic or includes mobile apps, and whether offline caching is expected. Confirm the definition of "real-time" -- how stale can suggestions be? Ask if personalization or user context (location, history) should influence results, or if this is purely popularity-based. Finally, establish success metrics: what latency percentiles matter most, and what availability target must you hit?

Practice/Meta/Design Typeahead Search

Design Typeahead Search

System DesignMust

Problem Statement

Key Requirements

Functional

Prefix-based suggestion retrieval -- return top-K ranked suggestions for any character sequence users type, updating with each keystroke
Popularity-based ranking -- order suggestions by aggregate frequency across all users, not personalized to individual behavior
Typo tolerance -- handle minor misspellings for longer queries (4+ characters) to improve suggestion quality
Multi-language support -- correctly tokenize and match queries in different languages and character sets
Suggestion selection -- allow users to click a suggestion to execute a full search or navigate to a result page

Non-Functional

Scalability -- support 500 million daily active users generating 50 billion keystroke requests per day
Reliability -- maintain 99.95% availability with automatic failover across regions
Latency -- deliver suggestions in under 100ms at p99, with p50 under 30ms
Consistency -- eventual consistency is acceptable; suggestions may lag popularity changes by minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Data Structure and Indexing Strategy

Hints to consider:

Trie structures enable O(k) prefix lookups where k is query length, but memory usage grows exponentially with vocabulary size
Inverted indexes with edge n-grams balance space efficiency with query speed for moderate-length prefixes
Pre-computing top-K results for common short prefixes (1-3 characters) eliminates query-time computation entirely
Consider how you'll handle the long tail of rare queries versus hot prefixes like "a", "the", or trending topics

2. Caching Architecture and Hot Key Management

Hints to consider:

Multi-tier caching with L1 (CDN edge), L2 (regional Redis), and L3 (distributed cache near data stores) reduces backend load exponentially
Request coalescing or collapsing prevents thundering herds when cache entries expire under heavy load
Static pre-computed responses for ultra-hot prefixes eliminate all backend queries for your highest-volume traffic
TTL strategy must balance freshness with hit rate -- longer TTLs for stable prefixes, shorter for trending terms

3. Write Path and Popularity Computation

Suggestions must reflect current popularity, but updating the index on every search would be prohibitively expensive. You need an efficient pipeline to aggregate query volume and refresh rankings.

Hints to consider:

Stream processing with windowed aggregation computes rolling popularity metrics without full re-indexing
Separate fast-changing trending terms from stable long-term popular queries using different update frequencies
Lambda architecture with batch baseline and real-time delta allows fresh suggestions without rebuilding the entire index
Consider using sampling or probabilistic counting for high-volume aggregation to reduce write amplification

4. Geographic Distribution and Tail Latency Control

With users worldwide and sub-100ms requirements, network distance dominates your latency budget. Regional deployment and smart request routing are essential, not optional.

Hints to consider:

Deploy read replicas in 10-15 geographic regions to keep users within 50ms of their nearest data center
DNS-based routing or anycast directs users to the closest healthy region automatically
Payload size matters -- returning only necessary fields and using compression can save 20-30ms on slow mobile connections
Circuit breakers and aggressive timeouts (50-75ms) with cached fallbacks prevent stragglers from degrading user experience