Practice/PayPal/Design a URL Shortener
Design a URL Shortener
System DesignMust
Problem Statement
Design a URL shortening service similar to TinyURL or Bitly that converts long URLs into compact, shareable short links and redirects visitors back to the original destination. The system must handle billions of redirects per day while maintaining sub-10ms redirect latency and supporting link management features like analytics, expiration, and custom aliases.
The problem appears straightforward but exercises several core distributed systems skills simultaneously. You need globally unique short code generation without coordination bottlenecks, an extremely read-heavy serving path (redirect traffic can be 100x to 1000x higher than creation traffic), multi-region deployment for low-latency redirects worldwide, and a click analytics pipeline that captures every redirect event without adding latency to the user-facing path. The system must also handle abuse scenarios like spam links, phishing URLs, and link enumeration attacks.
Consider that a small number of viral links can receive millions of clicks within minutes, creating extreme hot-key scenarios in your cache and storage layers. Your design must gracefully handle this skewed access pattern without degrading the experience for other links.
Key Requirements
Functional
- Short link creation -- Convert a long URL into a unique short code, with optional support for custom vanity aliases chosen by the user
- Redirect resolution -- Resolve a short code to the original URL and issue an HTTP 301 or 302 redirect with minimal latency
- Link management -- Authenticated users can view, update the destination, disable, or delete their links and set expiration dates
- Click analytics -- Track click counts, referrer sources, geographic distribution, and time-series data for each link
- Abuse prevention -- Detect and block malicious URLs at creation time and flag suspicious links that receive anomalous traffic patterns
Non-Functional
- Scalability -- Support 100 million link creations per month and 10 billion redirects per month with horizontal scaling
- Latency -- Redirect resolution in under 10ms at p99 for cached links and under 50ms for cache misses
- Availability -- 99.99% uptime for the redirect path; degraded analytics or creation is acceptable during partial outages
- Durability -- Zero data loss for link mappings; analytics data can tolerate minor loss during infrastructure failures
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Short Code Generation Strategy
Generating globally unique short codes without a single coordinating counter is the first design decision interviewers expect you to justify. The choice affects write scalability, code predictability (security), and collision probability.
Hints to consider:
- Compare base62-encoded auto-increment IDs (simple but predictable and requires coordination) versus random generation with collision checking
- Consider pre-generating batches of unique codes and assigning ranges to application servers to avoid real-time coordination
- Evaluate hashing the long URL (e.g., MD5 or SHA-256 truncated to 7 characters) with collision resolution via appending or rehashing
- Discuss code length trade-offs: 7 characters in base62 yields 3.5 trillion combinations, far exceeding realistic usage
2. Read-Heavy Serving Architecture
Redirect traffic dominates the workload by orders of magnitude. Every millisecond of redirect latency matters because users are waiting for their destination page. Interviewers want to see aggressive caching and edge optimization.
Hints to consider:
- Deploy a multi-tier cache: CDN edge nodes for the most popular links, regional Redis clusters for warm cache, and DynamoDB or similar for the persistent store
- Use HTTP 301 (permanent) redirects for immutable links so browsers and CDNs cache the mapping, reducing backend load
- For mutable links (where the destination can change), use 302 (temporary) redirects and shorter cache TTLs
- Implement request coalescing at the cache layer to prevent thundering herd on cache misses for viral links
3. Analytics Pipeline Design
Capturing click data for every redirect without adding latency to the redirect path requires fully decoupled asynchronous processing. Interviewers probe how you balance completeness, latency, and cost.
Hints to consider:
- Log click events asynchronously after issuing the redirect response -- fire-and-forget to a local buffer or Kafka producer
- Use Kafka as the durable event stream, with consumers aggregating counts into time-bucketed summaries in a columnar store like ClickHouse
- Pre-aggregate common metrics (total clicks, daily counts) in Redis counters for real-time dashboard queries
- Batch-process detailed breakdowns (referrer, geo, device) on a slower cadence to reduce compute costs
4. Handling Hot Keys and Viral Links
A single viral link shared on social media can receive millions of clicks per minute, creating extreme pressure on a single cache key and its backing storage entry. Naive architectures will buckle under this load.
Hints to consider:
- Replicate hot cache entries across multiple Redis nodes and load-balance reads across replicas
- Use CDN edge caching aggressively for links identified as viral based on recent click rate
- Implement local in-process caching (e.g., Caffeine or Guava) on application servers to absorb repeated lookups before hitting Redis
- Rate-limit analytics event ingestion per link to prevent a single viral link from overwhelming the Kafka pipeline