Practice/Meta/Design Typeahead Search
Design Typeahead Search
System DesignMust
Problem Statement
Build a system that provides instant search suggestions as users type characters into a search box. Think of how major platforms like Amazon, Netflix, or Twitter offer completion hints before you finish typing -- this system should deliver the same experience at global scale.
Your design must handle hundreds of millions of daily active users typing queries across thousands of geographic locations. Each keystroke triggers a new request, creating massive read volume with strict latency requirements. The system should return the top 10 most popular matches for any prefix within 100 milliseconds at the 99th percentile. Focus on delivering consistent, non-personalized suggestions based purely on query popularity, with graceful handling of network issues and missing matches.
Key Requirements
Functional
- Prefix-based suggestion retrieval -- return top-K ranked suggestions for any character sequence users type, updating with each keystroke
- Popularity-based ranking -- order suggestions by aggregate frequency across all users, not personalized to individual behavior
- Typo tolerance -- handle minor misspellings for longer queries (4+ characters) to improve suggestion quality
- Multi-language support -- correctly tokenize and match queries in different languages and character sets
- Suggestion selection -- allow users to click a suggestion to execute a full search or navigate to a result page
Non-Functional
- Scalability -- support 500 million daily active users generating 50 billion keystroke requests per day
- Reliability -- maintain 99.95% availability with automatic failover across regions
- Latency -- deliver suggestions in under 100ms at p99, with p50 under 30ms
- Consistency -- eventual consistency is acceptable; suggestions may lag popularity changes by minutes
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Data Structure and Indexing Strategy
The core challenge is choosing how to store and retrieve prefix matches efficiently. Your indexing approach determines query latency, storage cost, and how quickly you can incorporate new popular terms.
Hints to consider:
- Trie structures enable O(k) prefix lookups where k is query length, but memory usage grows exponentially with vocabulary size
- Inverted indexes with edge n-grams balance space efficiency with query speed for moderate-length prefixes
- Pre-computing top-K results for common short prefixes (1-3 characters) eliminates query-time computation entirely
- Consider how you'll handle the long tail of rare queries versus hot prefixes like "a", "the", or trending topics
2. Caching Architecture and Hot Key Management
Short prefixes like single letters generate massive duplicate traffic. Without intelligent caching, the same query hits your data store thousands of times per second, causing both cost and latency problems.
Hints to consider:
- Multi-tier caching with L1 (CDN edge), L2 (regional Redis), and L3 (distributed cache near data stores) reduces backend load exponentially
- Request coalescing or collapsing prevents thundering herds when cache entries expire under heavy load
- Static pre-computed responses for ultra-hot prefixes eliminate all backend queries for your highest-volume traffic
- TTL strategy must balance freshness with hit rate -- longer TTLs for stable prefixes, shorter for trending terms
3. Write Path and Popularity Computation
Suggestions must reflect current popularity, but updating the index on every search would be prohibitively expensive. You need an efficient pipeline to aggregate query volume and refresh rankings.
Hints to consider:
- Stream processing with windowed aggregation computes rolling popularity metrics without full re-indexing
- Separate fast-changing trending terms from stable long-term popular queries using different update frequencies
- Lambda architecture with batch baseline and real-time delta allows fresh suggestions without rebuilding the entire index
- Consider using sampling or probabilistic counting for high-volume aggregation to reduce write amplification
4. Geographic Distribution and Tail Latency Control
With users worldwide and sub-100ms requirements, network distance dominates your latency budget. Regional deployment and smart request routing are essential, not optional.
Hints to consider:
- Deploy read replicas in 10-15 geographic regions to keep users within 50ms of their nearest data center
- DNS-based routing or anycast directs users to the closest healthy region automatically
- Payload size matters -- returning only necessary fields and using compression can save 20-30ms on slow mobile connections
- Circuit breakers and aggressive timeouts (50-75ms) with cached fallbacks prevent stragglers from degrading user experience
Suggested Approach
Step 1: Clarify Requirements
Start by confirming scope and constraints with your interviewer. Ask about daily active users, queries per user, expected growth rate, and whether suggestions should be global or regional. Clarify if the system handles only web traffic or includes mobile apps, and whether offline caching is expected. Confirm the definition of "real-time" -- how stale can suggestions be? Ask if personalization or user context (location, history) should influence results, or if this is purely popularity-based. Finally, establish success metrics: what latency percentiles matter most, and what availability target must you hit?