Design the search and recommendation backend that powers the Uber Eats consumer app. Your system must let 100 M+ monthly users, anywhere in the world, discover restaurants and groceries that can actually deliver to them in under a minute. The input to the system is a search request that always contains a delivery address (lat/lng) and may contain a query string (“pizza”, “milk”, “McDonald’s”), dietary or cuisine filters, price range, etc. The output is a ranked list of stores (restaurants, grocers, convenience, liquor, etc.) that (a) deliver to that address, (b) are open and have capacity right now, (c) satisfy any text/filter constraints, and (d) are ordered by expected user satisfaction, conversion, and business value. The same backend also powers the “home feed” when no query is typed; in that case the system must recommend stores and items based on the user’s historical orders, real-time context (time of day, day of week, weather), and collaborative signals from similar users. The design must support: (1) 50 k QPS read traffic with P99 < 300 ms end-to-end latency, (2) 10 M+ active merchants and 1 B+ searchable items (dishes/SKUs) world-wide, (3) real-time freshness: menu/SKU price & availability, delivery ETA, promo status, and acceptance rate can change every minute and must be reflected in results, (4) multi-vertical expansion: restaurants, grocery, retail, alcohol, prescriptions, each with different catalog sizes and relevance signals, (5) multi-objective ranking: relevance, personal preference, estimated delivery time, delivery fee, rating, popularity, sponsorship, and legal compliance (e.g., age-restricted items), (6) A/B and ML agility: data scientists must be able to launch new embedding models, ranking features, and autocompletion strategies weekly without a full re-index. You do not need to design payments, cart, or driver dispatch, but your service must consume live ETA and capacity signals from those systems. Outline the high-level architecture, data models, indexing strategy, retrieval and ranking pipeline, sharding scheme, and how you would roll out a new two-tower neural ranking model that uses real-time embeddings for both users and stores.