Practice/OpenAI/URL Shortener
URL Shortener
System DesignOptional
Problem Statement
Design a URL shortening service like Bit.ly or TinyURL that converts long URLs into shorter aliases and redirects users to the original URL when the short link is visited.
This is a classic system design question -- it is accessible enough for junior candidates yet offers significant depth for senior discussions around ID generation, hashing strategies, caching, database design, and analytics at scale.
Key Requirements
Functional
- Shorten URL -- given a long URL, generate a unique short URL (e.g.,
https://short.ly/abc123)
- Redirect -- when a user visits the short URL, redirect (301/302) to the original long URL
- Custom aliases -- optionally allow users to choose a custom short code
- Expiration -- short URLs can have an optional TTL after which they expire
- Analytics -- track click counts, referrer, geographic location, and timestamp per short URL
Non-Functional
- High read throughput -- redirects are the dominant operation (100:1 read-to-write ratio)
- Low latency -- redirects must complete in under 10ms (p99)
- High availability -- the redirect path must be highly available (99.99%)
- Scalability -- handle billions of short URLs and millions of redirects per day
- Durability -- short URL mappings must not be lost
What Interviewers Focus On
This question appears simple on the surface, but interviewers use it to probe depth on several core system design topics.
1. Short Code Generation (Core Design Decision)
How do you generate a unique, compact short code for each URL? This is where interviewers spend the most time.
Hints to consider:
- Base62 encoding of auto-increment ID: use a counter (database sequence or distributed ID generator), encode the numeric ID in base62 (a-z, A-Z, 0-9) -- 7 characters gives 62^7 = ~3.5 trillion unique codes
- Hash-based: MD5/SHA256 of the long URL, take the first 7 characters -- risk of collisions, need collision handling
- Pre-generated key store: generate random keys in advance, assign from a pool -- avoids runtime collision checks
- Trade-offs: counter-based is predictable (sequential codes leak information) vs. hash-based risks collisions vs. pre-generated adds a key management component
- Custom aliases: check for uniqueness against existing codes, reserve the namespace
2. Database Design and Read Optimization
The URL mapping table is the core data model, and reads dominate writes heavily.
Hints to consider:
- Schema:
short_code (PK), long_url, user_id, created_at, expires_at, click_count
- Read-optimized: short_code is the primary key, direct lookup by key -- O(1) in any key-value store
- Database choice: relational (PostgreSQL) for ACID guarantees, or NoSQL (DynamoDB, Cassandra) for horizontal scale
- Sharding: shard by short_code hash for even distribution; range-based sharding on short_code if using base62 counter
- Write path: insert new mapping; read path: lookup by short_code -- these have very different access patterns
3. Caching Strategy
With a 100:1 read-to-write ratio and many popular links getting millions of clicks, caching is critical.
Hints to consider:
- Cache the short_code -> long_url mapping in Redis/Memcached
- Cache hit rate will be very high -- popular URLs follow a power-law distribution (80/20 rule)
- Cache eviction: LRU works well since popular links are accessed repeatedly
- Cache-aside pattern: check cache first, on miss read from DB and populate cache
- Cache invalidation: on URL deletion or expiration, evict from cache
- Capacity estimation: 1 billion URLs x ~200 bytes each = ~200GB -- fits in a distributed cache cluster
4. Redirect: 301 vs. 302
A small but important detail that interviewers look for.
Hints to consider:
- 301 (Permanent Redirect): browser caches the redirect -- fewer requests hit your server, but you lose analytics visibility
- 302 (Temporary Redirect): browser does not cache -- every click hits your server, enabling accurate analytics
- Most URL shorteners use 302 to maintain analytics accuracy
- Trade-off: 301 reduces server load but sacrifices click tracking; 302 enables analytics but increases traffic