Practice/Google/Design an Image Uploader
Design an Image Uploader
System DesignMust
Problem Statement
Design a system that collects Street View imagery from cameras mounted on a fleet of taxis operating across a large metropolitan area. Each vehicle continuously captures geotagged photographs and uploads them over cellular networks to a central cloud platform for processing, stitching, and eventual serving on a mapping product.
The key challenge is reliably ingesting large binary blobs (each image is several megabytes) over unreliable mobile connections. Vehicles move through tunnels, dead zones, and congested cell towers, so uploads frequently stall or disconnect. The system must support resumable uploads so that partially transferred images are not lost and bandwidth is not wasted re-transmitting already-received bytes.
Beyond ingestion, the platform must deduplicate images (the same intersection may be photographed by dozens of taxis daily), route each image through a multi-step processing pipeline (metadata extraction, quality scoring, blurring of faces and license plates), and store the final assets for downstream consumption by the Street View rendering service.
Key Requirements
Functional
- Resumable upload -- Allow vehicles to pause and resume an upload from where it left off after a network interruption
- Deduplication -- Detect and discard duplicate or near-duplicate images of the same location captured within a short time span
- Processing pipeline -- Route each accepted image through metadata extraction, quality filtering, and privacy blurring stages
- Status tracking -- Provide an operations dashboard showing per-vehicle upload progress and pipeline stage for each image
Non-Functional
- Scalability -- Support tens of thousands of vehicles uploading concurrently, each producing hundreds of images per hour
- Latency -- An image should reach the processing pipeline within minutes of capture, not hours
- Durability -- No image should be lost once the server has acknowledged receipt, even during infrastructure failures
- Bandwidth efficiency -- Minimize redundant data transfer over metered cellular connections
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Resumable Upload Protocol
Interviewers want to understand how you handle partial uploads gracefully without re-sending the entire file.
Hints to consider:
- How does the client know which byte offset to resume from after reconnecting?
- Where do you store partially uploaded chunks — directly in S3 with multipart upload, or in a temporary staging area?
- How do you handle the case where the server has received bytes the client thinks were not acknowledged?
- What metadata do you track per upload session (upload ID, expected size, checksum)?
2. Edge-to-Cloud Data Flow
The path from a camera on a moving vehicle to your cloud storage involves several unreliable hops.
Hints to consider:
- Should the on-vehicle agent buffer images locally and batch uploads, or stream them individually?
- How do you prioritize which images to upload first when bandwidth is limited?
- What role does an API Gateway play in authenticating, rate-limiting, and routing upload traffic?
- How do you handle clock skew between the vehicle's GPS timestamp and server time?
3. Deduplication Strategy
Multiple taxis photograph the same locations repeatedly, and storing every copy is wasteful.
Hints to consider:
- Can you use perceptual hashing to detect near-duplicate images rather than exact byte matching?
- At what stage in the pipeline do you deduplicate — before full upload, after, or during processing?
- How do you index images by geolocation and time to efficiently find candidate duplicates?
- What is the tradeoff between aggressive deduplication (risk losing unique angles) and permissive deduplication (wasted storage)?