Design a large-scale product catalogue platform that aggregates grocery and household items from multiple retailers and lets customers browse, filter, and search products within their chosen store. The system must handle millions of SKUs across thousands of store locations, where each store carries its own prices, stock levels, and product assortment that change throughout the day.
The core challenges include maintaining a hierarchical category taxonomy that works across diverse retailers, keeping product data synchronized in near real-time as prices and inventory fluctuate, and delivering sub-second browsing experiences despite the massive dataset. Category pages are accessed far more frequently than products are updated, creating a read-heavy traffic pattern that demands aggressive caching and indexing. At the same time, users must always see accurate information scoped to their specific store to prevent checkout failures and eroded trust.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The hierarchical structure of categories and how you represent products with store-specific attributes is fundamental. A poor model leads to expensive queries and difficult maintenance.
Hints to consider:
Category pages see millions of daily hits, but underlying data changes frequently. Your caching layers determine whether the system stays fast and cost-effective.
Hints to consider:
Retailers push updates through varying mechanisms and frequencies. Stale prices lead to abandoned carts and customer dissatisfaction.
Hints to consider:
Users expect instant results when applying multiple filters to categories with tens of thousands of products. Poorly designed indexing makes this impossible.
Hints to consider:
Start by understanding the scope. Ask about the number of retailers, products, and stores in the system. Confirm whether all stores from a retailer share the same catalogue or each has a unique assortment. Determine acceptable staleness for different data types. Clarify whether the system must support personalization or only anonymous browsing. Understand peak load patterns and geographic distribution.
Sketch the major components: an ingestion layer that normalizes retailer feeds into a canonical product model, a search and indexing service (Elasticsearch or similar) that powers category browsing and filtering, a multi-tier caching layer with Redis for application-level caching and a CDN for static content, and API servers that orchestrate requests. Include a separate product information service for master data distinct from store-specific pricing and inventory. Show event streams via Kafka that propagate updates from ingestion to search indices and caches. Illustrate both the read path (user browses a category) and the write path (retailer updates prices).
Walk through the search document structure in detail. Each document contains product ID, name, brand, a category-path array enabling ancestor queries, store-specific nested objects with price and availability, and pre-computed filter attributes. Explain the partitioning strategy, whether sharding by store location or geographic region to keep hot data co-located. Show how category page queries translate into index scans with filters, and distinguish the common case (popular category, frequent filters) from long-tail queries. Address why cursor-based pagination outperforms offset-based for large result sets.
Cover cache invalidation: when a price update arrives, identify which cache keys need invalidation and how to prevent thundering herd problems. Discuss monitoring: track cache hit rates, query latencies by category depth, and staleness metrics. Address failure scenarios: what happens if the search cluster degrades (fall back to database queries with reduced filter support) or retailer feeds stop updating (show a last-known-good timestamp). Mention consistency around users adding items to cart while prices update, and how to handle version conflicts. Touch on how the system would evolve to support personalized ranking or real-time deal notifications.