Practice/Google/Design a photo search application
Design a photo search application
System DesignOptional
Problem Statement
You are designing a mobile photo application that lets users search their personal photo libraries using natural language keywords like "mountains," "birthday party," or "dog at the beach." The app stores photos taken or uploaded by users, automatically analyzes each image using machine learning models to generate descriptive tags and labels, and indexes those tags so that keyword searches return relevant results in milliseconds. Users expect accurate, fast results even when their libraries contain tens of thousands of photos.
The ML tagging pipeline runs asynchronously — when a user uploads a photo, it is stored immediately, but tag generation happens in the background via a computer vision model that may take seconds to complete. The system must handle this eventual consistency gracefully, ensuring that recently uploaded photos appear in search results once processing finishes without requiring the user to refresh. Photos are private by default, and search results must respect access control lists so that users only see their own images unless specific albums are shared.
At scale, the platform serves millions of users, each with thousands of photos, generating billions of stored images and an enormous volume of tags. The search infrastructure must support both exact keyword matches and semantic similarity queries, ranking results by relevance factors like tag confidence scores, recency, and user engagement patterns.
Key Requirements
Functional
- Photo upload and storage -- Users upload photos from their mobile device; the system stores the original image durably and generates thumbnails for fast browsing
- Automated ML tagging -- An asynchronous computer vision pipeline analyzes each uploaded photo and produces descriptive tags with confidence scores
- Keyword search with ranking -- Users search their photo library by keywords; the system returns results ranked by tag relevance, confidence, and recency
- Access-controlled results -- Search results enforce per-user and per-album permissions so users only see photos they own or that have been explicitly shared with them
Non-Functional
- Scalability -- Support hundreds of millions of users with thousands of photos each, totaling billions of images and tens of billions of tag entries
- Latency -- Search queries return results within 200 milliseconds; photo uploads complete within 2 seconds with tagging finishing asynchronously within 30 seconds
- Availability -- The upload and search paths maintain 99.9% availability independently, so a tagging pipeline backlog does not block uploads or searches
- Consistency -- Tags become searchable within seconds of pipeline completion; users accept that very recently uploaded photos may not yet appear in search results
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. ML Tagging Pipeline Design
The asynchronous pipeline that processes uploaded photos through computer vision models is the backbone of the search experience. Interviewers want to see how you design it for throughput, reliability, and graceful degradation.
Hints to consider:
- Think about how you decouple the upload path from the tagging path — the user should get a success response before tagging begins
- Consider how you handle tagging failures: retry policies, dead-letter queues, and whether you surface untagged photos differently in the UI
- Plan for model versioning — when you deploy an improved vision model, do you re-tag the entire photo corpus or only new uploads
- Decide how you scale the GPU-intensive inference workers independently from the rest of the system
2. Search Indexing Strategy
With billions of photos and tens of billions of tags, the search index must support fast per-user keyword lookups. Interviewers probe your choice of index structure, sharding strategy, and how you keep the index consistent with the tagging pipeline.
Hints to consider:
- Consider using Elasticsearch with an inverted index where documents represent photos and fields include tags, confidence scores, and owner ID
- Think about how you shard the index — by user ID for query locality or by photo ID for write distribution — and the trade-offs of each
- Plan how newly generated tags flow into the search index: direct write from the tagging worker versus an intermediate event stream
- Address how you handle tag updates when a photo is re-tagged with an improved model — in-place update versus append-and-merge
3. ACL-Aware Search
Search results must respect ownership and sharing permissions without adding prohibitive latency. Interviewers look for a filtering strategy that scales with the number of users and shared albums.
Hints to consider:
- Consider embedding the owner ID and shared-with user IDs directly in the search index document so permission filtering happens at query time
- Think about the trade-off between pre-filtering (restrict the search space before ranking) and post-filtering (rank first, then remove unauthorized results)
- Plan for shared albums where a user gains access to thousands of photos from another user — how does this affect index structure
- Address permission revocation: when a user is removed from a shared album, how quickly do those photos disappear from their search results
4. Storage and Delivery for Large Photo Libraries
Storing billions of high-resolution images and serving them to mobile clients requires careful blob storage and CDN design. Interviewers expect you to optimize for both storage cost and retrieval speed.
Hints to consider:
- Think about generating and storing multiple resolutions (thumbnail, preview, full) at upload time to avoid on-the-fly resizing during search result rendering
- Consider using content-addressable storage with deduplication for users who upload the same photo multiple times
- Plan your CDN strategy for serving thumbnails in search results versus full-resolution images in the detail view
- Address lifecycle policies for storage cost optimization — move infrequently accessed originals to cheaper cold storage tiers