For a full example answer with detailed architecture diagrams and deep dives, see our Design Yelp guide. While the Yelp guide focuses on business discovery, many of the same patterns around search indexing, location-scoped data, and read-heavy caching strategies apply directly to a product catalog system.
Also review the Search, Caching, and Databases building blocks for background on faceted search, multi-tier caching, and data modeling for hierarchical taxonomies.
Design a product catalog system that allows users to browse products by categories and subcategories, similar to Instacart's product discovery experience. Users pick a location or store, navigate a hierarchical category tree, filter and sort items, and view product details including price, packaging, and real-time availability.
The system is extremely read-heavy: category pages are viewed millions of times per day, but product details (prices, inventory) change frequently per store. The core challenges are designing a hierarchical taxonomy that works across diverse retailers, keeping product information synchronized in near real-time as prices and stock levels change throughout the day, and delivering sub-second browsing experiences despite the massive dataset. You must handle location-scoped data where every price and availability figure is specific to a particular store, and design caching and indexing strategies that stay fresh without full rebuilds on every update.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The hierarchical category structure and how you represent products with store-specific attributes is fundamental. A poor data model leads to expensive queries and difficult maintenance.
Hints to consider:
Category pages are accessed millions of times but product details change frequently. Your caching layers determine whether the system can meet latency targets cost-effectively.
Hints to consider:
Retailers push updates through various mechanisms at different frequencies. Stale data causes abandoned carts and eroded customer trust.
Hints to consider:
Users expect instant results when combining multiple filters over categories containing tens of thousands of products. Poor indexing makes this impossible.
Hints to consider:
Ask about the number of retailers, products per store, and total store count. Confirm whether all stores from a retailer share the same catalog or each location has a unique assortment. Determine acceptable staleness for different data types -- can prices lag by minutes, or must they be real-time? Clarify whether personalization (recommendations, recently viewed) is in scope. Understand peak load patterns and whether traffic is concentrated in specific regions or timezones.
Sketch the major components: an Ingestion Layer that normalizes retailer feeds into a canonical product model via Kafka, a Product Service backed by PostgreSQL for master product data, an Inventory Service that tracks store-specific pricing and availability, a Search Service powered by Elasticsearch for category browsing and filtering, a multi-tier Cache Layer (Redis for application-level, CDN for static assets), and API servers that orchestrate client requests. Show two data flows: the read path (user browses a category, Search Service returns product IDs, Inventory Service hydrates with store-specific price and availability from cache), and the write path (retailer sends an update, Ingestion Layer processes and writes to the database, events propagate to Elasticsearch and Redis for cache invalidation).
Walk through the search index design. Each Elasticsearch document represents a product-store combination containing: product ID, name, brand, category path array (enabling ancestor queries), store-specific price and availability as nested fields, and pre-computed filter attributes (dietary tags, price tier bucket). Explain the partitioning strategy: shard by store region to keep hot data co-located. Show how a category page query translates to an Elasticsearch filter query with faceted aggregations, returning product IDs that are hydrated with fresh price data from the Redis cache. Discuss how cursor-based pagination works: each page response includes a sort key that the next request uses as a starting point, ensuring stable results even as prices and inventory change between page loads.
Cover cache invalidation: when a price update arrives via Kafka, the consumer updates the Elasticsearch document and invalidates the specific product's Redis cache entry; category page caches are invalidated only if the price change affects the current sort order or filter results. Discuss monitoring: track cache hit rates per store, Elasticsearch query latencies by category depth, inventory staleness metrics, and ingestion pipeline lag. Address failure scenarios: if Elasticsearch is degraded, fall back to PostgreSQL queries with reduced filter support and show a staleness indicator to users. Mention consistency around checkout: re-validate price and availability at order submission time against the authoritative database, not the cache. Touch on how the system evolves to support features like personalized ranking and real-time deal notifications.
"Design the Instacart product catalog page. You can get products by categories. Each category has subcategories."