Practice/PayPal/Design a URL Shortener

Design a URL Shortener

System DesignMust

Problem Statement

Design a URL shortening service similar to TinyURL or Bitly that converts long URLs into compact, shareable short links and redirects visitors back to the original destination. The system must handle billions of redirects per day while maintaining sub-10ms redirect latency and supporting link management features like analytics, expiration, and custom aliases.

The problem appears straightforward but exercises several core distributed systems skills simultaneously. You need globally unique short code generation without coordination bottlenecks, an extremely read-heavy serving path (redirect traffic can be 100x to 1000x higher than creation traffic), multi-region deployment for low-latency redirects worldwide, and a click analytics pipeline that captures every redirect event without adding latency to the user-facing path. The system must also handle abuse scenarios like spam links, phishing URLs, and link enumeration attacks.

Consider that a small number of viral links can receive millions of clicks within minutes, creating extreme hot-key scenarios in your cache and storage layers. Your design must gracefully handle this skewed access pattern without degrading the experience for other links.

Key Requirements

Functional

Short link creation -- Convert a long URL into a unique short code, with optional support for custom vanity aliases chosen by the user
Redirect resolution -- Resolve a short code to the original URL and issue an HTTP 301 or 302 redirect with minimal latency
Link management -- Authenticated users can view, update the destination, disable, or delete their links and set expiration dates
Click analytics -- Track click counts, referrer sources, geographic distribution, and time-series data for each link
Abuse prevention -- Detect and block malicious URLs at creation time and flag suspicious links that receive anomalous traffic patterns

Non-Functional

Scalability -- Support 100 million link creations per month and 10 billion redirects per month with horizontal scaling
Latency -- Redirect resolution in under 10ms at p99 for cached links and under 50ms for cache misses
Availability -- 99.99% uptime for the redirect path; degraded analytics or creation is acceptable during partial outages
Durability -- Zero data loss for link mappings; analytics data can tolerate minor loss during infrastructure failures

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Short Code Generation Strategy

Generating globally unique short codes without a single coordinating counter is the first design decision interviewers expect you to justify. The choice affects write scalability, code predictability (security), and collision probability.

Hints to consider:

Compare base62-encoded auto-increment IDs (simple but predictable and requires coordination) versus random generation with collision checking
Consider pre-generating batches of unique codes and assigning ranges to application servers to avoid real-time coordination
Evaluate hashing the long URL (e.g., MD5 or SHA-256 truncated to 7 characters) with collision resolution via appending or rehashing
Discuss code length trade-offs: 7 characters in base62 yields 3.5 trillion combinations, far exceeding realistic usage

2. Read-Heavy Serving Architecture

Redirect traffic dominates the workload by orders of magnitude. Every millisecond of redirect latency matters because users are waiting for their destination page. Interviewers want to see aggressive caching and edge optimization.

Hints to consider:

Deploy a multi-tier cache: CDN edge nodes for the most popular links, regional Redis clusters for warm cache, and DynamoDB or similar for the persistent store
Use HTTP 301 (permanent) redirects for immutable links so browsers and CDNs cache the mapping, reducing backend load
For mutable links (where the destination can change), use 302 (temporary) redirects and shorter cache TTLs
Implement request coalescing at the cache layer to prevent thundering herd on cache misses for viral links

3. Analytics Pipeline Design

Capturing click data for every redirect without adding latency to the redirect path requires fully decoupled asynchronous processing. Interviewers probe how you balance completeness, latency, and cost.

Hints to consider:

Log click events asynchronously after issuing the redirect response -- fire-and-forget to a local buffer or Kafka producer
Use Kafka as the durable event stream, with consumers aggregating counts into time-bucketed summaries in a columnar store like ClickHouse
Pre-aggregate common metrics (total clicks, daily counts) in Redis counters for real-time dashboard queries
Batch-process detailed breakdowns (referrer, geo, device) on a slower cadence to reduce compute costs

4. Handling Hot Keys and Viral Links

A single viral link shared on social media can receive millions of clicks per minute, creating extreme pressure on a single cache key and its backing storage entry. Naive architectures will buckle under this load.

Hints to consider:

Replicate hot cache entries across multiple Redis nodes and load-balance reads across replicas
Use CDN edge caching aggressively for links identified as viral based on recent click rate
Implement local in-process caching (e.g., Caffeine or Guava) on application servers to absorb repeated lookups before hitting Redis
Rate-limit analytics event ingestion per link to prevent a single viral link from overwhelming the Kafka pipeline

Practice/PayPal/Design a URL Shortener

Design a URL Shortener

System DesignMust

Problem Statement

Key Requirements

Functional

Short link creation -- Convert a long URL into a unique short code, with optional support for custom vanity aliases chosen by the user
Redirect resolution -- Resolve a short code to the original URL and issue an HTTP 301 or 302 redirect with minimal latency
Link management -- Authenticated users can view, update the destination, disable, or delete their links and set expiration dates
Click analytics -- Track click counts, referrer sources, geographic distribution, and time-series data for each link
Abuse prevention -- Detect and block malicious URLs at creation time and flag suspicious links that receive anomalous traffic patterns

Non-Functional

Scalability -- Support 100 million link creations per month and 10 billion redirects per month with horizontal scaling
Latency -- Redirect resolution in under 10ms at p99 for cached links and under 50ms for cache misses
Availability -- 99.99% uptime for the redirect path; degraded analytics or creation is acceptable during partial outages
Durability -- Zero data loss for link mappings; analytics data can tolerate minor loss during infrastructure failures

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Short Code Generation Strategy

Hints to consider:

Compare base62-encoded auto-increment IDs (simple but predictable and requires coordination) versus random generation with collision checking
Consider pre-generating batches of unique codes and assigning ranges to application servers to avoid real-time coordination
Evaluate hashing the long URL (e.g., MD5 or SHA-256 truncated to 7 characters) with collision resolution via appending or rehashing
Discuss code length trade-offs: 7 characters in base62 yields 3.5 trillion combinations, far exceeding realistic usage

2. Read-Heavy Serving Architecture

Hints to consider:

Deploy a multi-tier cache: CDN edge nodes for the most popular links, regional Redis clusters for warm cache, and DynamoDB or similar for the persistent store
Use HTTP 301 (permanent) redirects for immutable links so browsers and CDNs cache the mapping, reducing backend load
For mutable links (where the destination can change), use 302 (temporary) redirects and shorter cache TTLs
Implement request coalescing at the cache layer to prevent thundering herd on cache misses for viral links

3. Analytics Pipeline Design

Hints to consider:

Log click events asynchronously after issuing the redirect response -- fire-and-forget to a local buffer or Kafka producer
Use Kafka as the durable event stream, with consumers aggregating counts into time-bucketed summaries in a columnar store like ClickHouse
Pre-aggregate common metrics (total clicks, daily counts) in Redis counters for real-time dashboard queries
Batch-process detailed breakdowns (referrer, geo, device) on a slower cadence to reduce compute costs

4. Handling Hot Keys and Viral Links

Hints to consider:

Replicate hot cache entries across multiple Redis nodes and load-balance reads across replicas
Use CDN edge caching aggressively for links identified as viral based on recent click rate
Implement local in-process caching (e.g., Caffeine or Guava) on application servers to absorb repeated lookups before hitting Redis
Rate-limit analytics event ingestion per link to prevent a single viral link from overwhelming the Kafka pipeline