Practice/Amazon/Design an Auto-Complete Feature for a Store's Search Bar

Design an Auto-Complete Feature for a Store's Search Bar

System DesignMust

Problem Statement

Design an auto-complete search feature for an e-commerce store that provides real-time suggestions as users type queries into the search bar. The system must return relevant suggestions within 100-200ms of each keystroke, support millions of concurrent users during peak shopping hours, and rank suggestions by a combination of popularity, recency, and user personalization.

At Amazon, interviewers ask this to evaluate whether you can build a low-latency, high-throughput read path that serves prefix-matched suggestions while continuously incorporating new search trends and product launches. The challenge lies in balancing index freshness with query latency, handling the bursty nature of keystroke traffic, and providing personalized results without adding latency to the critical path.

Key Requirements

Functional

Prefix-based suggestions -- as users type each character, display a ranked list of 5-10 suggested queries or products matching the prefix
Popularity-weighted ranking -- suggestions are ranked by search frequency, click-through rate, and optionally user-specific signals
Trending and fresh content -- newly launched products and trending queries appear in suggestions within minutes
Spell correction and fuzzy matching -- handle common misspellings and suggest corrections alongside prefix matches

Non-Functional

Scalability -- handle millions of concurrent users with each generating 5-10 suggestion requests per search session
Reliability -- maintain 99.9% availability; gracefully degrade by showing cached suggestions if the ranking service is slow
Latency -- return suggestions within 100ms at p99 to feel instantaneous to users
Consistency -- eventual consistency acceptable; new products or trending queries visible within 5 minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Trie-Based Index Design

The core data structure for prefix matching is a trie or its space-optimized variant. Interviewers want to see if you understand how to build and serve prefix lookups efficiently at scale.

Hints to consider:

Use a compressed trie (radix tree) to store suggestion terms, with each node holding aggregate popularity scores
Pre-compute and store the top-K suggestions at each trie node to avoid traversal on every query
Shard the trie by prefix range (a-f, g-m, etc.) across multiple servers for horizontal scaling
Alternatively, use sorted sets in Redis with lexicographic range queries for a simpler operational model

2. Index Freshness and Update Pipeline

The suggestion index must incorporate trending queries and new products without rebuilding from scratch. Interviewers probe your update strategy.

Hints to consider:

Use a streaming pipeline: log search queries and click events to Kafka, consume with aggregation workers that update popularity counts
Rebuild suggestion indexes periodically (every 5 minutes) from aggregated data, and swap atomically (blue-green deployment of index)
Maintain a hot buffer for very recent trending queries that gets merged with the main index at query time
Use count-min sketch or approximate counters for real-time popularity tracking to avoid exact counting overhead

3. Client-Side Optimization and Debouncing

Every keystroke potentially triggers a server request. Interviewers expect strategies to reduce unnecessary load.

Hints to consider:

Implement client-side debouncing (wait 50-100ms after last keystroke before sending request) to batch rapid typing
Cache recent prefix results on the client; if the user types "sho" and already has results for "sh", filter locally first
Use HTTP caching headers for common prefixes so CDN or browser cache serves repeat queries
Abort in-flight requests when a new keystroke arrives to prevent stale suggestions from rendering

4. Personalization Without Latency Impact

Adding user-specific signals improves relevance but must not slow down the critical path. Interviewers look for offline-online hybrid approaches.

Hints to consider:

Pre-compute per-user suggestion boosts (based on purchase history, recent searches) offline and store in a fast cache keyed by user_id
At query time, fetch the user's boost profile from cache and re-rank the generic top-K suggestions
Fall back to generic (non-personalized) suggestions if the personalization cache is unavailable
Use A/B testing to measure the lift from personalization and justify the added complexity

Suggested Approach

Step 1: Clarify Requirements

Confirm scope with your interviewer. Ask about the suggestion source: query completions, product names, category names, or a mix. Determine the expected query volume and concurrency. Clarify latency targets and whether personalization is required. Ask about supported languages and whether fuzzy matching or spell correction is in scope. Establish how fresh suggestions need to be.

Practice/Amazon/Design an Auto-Complete Feature for a Store's Search Bar

Design an Auto-Complete Feature for a Store's Search Bar

System DesignMust

Problem Statement

Key Requirements

Functional

Prefix-based suggestions -- as users type each character, display a ranked list of 5-10 suggested queries or products matching the prefix
Popularity-weighted ranking -- suggestions are ranked by search frequency, click-through rate, and optionally user-specific signals
Trending and fresh content -- newly launched products and trending queries appear in suggestions within minutes
Spell correction and fuzzy matching -- handle common misspellings and suggest corrections alongside prefix matches

Non-Functional

Scalability -- handle millions of concurrent users with each generating 5-10 suggestion requests per search session
Reliability -- maintain 99.9% availability; gracefully degrade by showing cached suggestions if the ranking service is slow
Latency -- return suggestions within 100ms at p99 to feel instantaneous to users
Consistency -- eventual consistency acceptable; new products or trending queries visible within 5 minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Trie-Based Index Design

The core data structure for prefix matching is a trie or its space-optimized variant. Interviewers want to see if you understand how to build and serve prefix lookups efficiently at scale.

Hints to consider:

Use a compressed trie (radix tree) to store suggestion terms, with each node holding aggregate popularity scores
Pre-compute and store the top-K suggestions at each trie node to avoid traversal on every query
Shard the trie by prefix range (a-f, g-m, etc.) across multiple servers for horizontal scaling
Alternatively, use sorted sets in Redis with lexicographic range queries for a simpler operational model

2. Index Freshness and Update Pipeline

The suggestion index must incorporate trending queries and new products without rebuilding from scratch. Interviewers probe your update strategy.

Hints to consider:

Use a streaming pipeline: log search queries and click events to Kafka, consume with aggregation workers that update popularity counts
Rebuild suggestion indexes periodically (every 5 minutes) from aggregated data, and swap atomically (blue-green deployment of index)
Maintain a hot buffer for very recent trending queries that gets merged with the main index at query time
Use count-min sketch or approximate counters for real-time popularity tracking to avoid exact counting overhead

3. Client-Side Optimization and Debouncing

Every keystroke potentially triggers a server request. Interviewers expect strategies to reduce unnecessary load.

Hints to consider:

Implement client-side debouncing (wait 50-100ms after last keystroke before sending request) to batch rapid typing
Cache recent prefix results on the client; if the user types "sho" and already has results for "sh", filter locally first
Use HTTP caching headers for common prefixes so CDN or browser cache serves repeat queries
Abort in-flight requests when a new keystroke arrives to prevent stale suggestions from rendering

4. Personalization Without Latency Impact

Adding user-specific signals improves relevance but must not slow down the critical path. Interviewers look for offline-online hybrid approaches.

Hints to consider:

Pre-compute per-user suggestion boosts (based on purchase history, recent searches) offline and store in a fast cache keyed by user_id
At query time, fetch the user's boost profile from cache and re-rank the generic top-K suggestions
Fall back to generic (non-personalized) suggestions if the personalization cache is unavailable
Use A/B testing to measure the lift from personalization and justify the added complexity