Design a URL Shortener — Whoop

Problem Statement

Design a URL shortening service similar to TinyURL or Bitly that converts long URLs into compact, shareable links and redirects visitors to the original destination. The system must generate globally unique short codes without collisions, handle extremely read-heavy traffic (every short link click triggers a redirect), and capture basic analytics on link usage.

The technical challenges go beyond simple key-value storage. Short code generation must be fast and collision-free across distributed nodes. Redirect latency must be minimal since every millisecond adds to user-perceived page load time. Analytics ingestion must not slow down the redirect path. And the system must gracefully handle abuse (spam links, phishing) while scaling to billions of stored URLs and millions of redirects per day.

Key Requirements

Functional

Link creation -- users submit a long URL and receive a unique short code (e.g., sho.rt/Ab3Cd) with optional custom aliases and expiration dates
Redirect -- visitors hitting a short URL are redirected to the original destination with minimal latency
Link management -- authenticated users view, disable, delete, and optionally update the destination of their short links through a dashboard
Analytics -- track total clicks, click time series, referrer sources, and geographic distribution for each short link

Non-Functional

Scalability -- support 100 million stored URLs with 10,000 redirects per second at steady state and burst capacity to 100,000 per second
Reliability -- guarantee 99.99 percent uptime for the redirect path; tolerate datacenter failures without data loss
Latency -- serve redirects in under 10 milliseconds at p99 from cache, under 50 milliseconds on cache miss
Consistency -- newly created links must be resolvable within 1 second; eventual consistency acceptable for analytics and dashboard views

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Short Code Generation Without Collisions

Generating short, unique codes at high throughput across distributed servers is the signature challenge. Naive approaches like hashing or global counters each have drawbacks. Interviewers expect you to reason about trade-offs and pick a pragmatic strategy.

Hints to consider:

Use base-62 encoding of a unique integer ID to produce compact alphanumeric codes (6-7 characters support billions of URLs)
Avoid a single auto-increment counter as it creates a hot primary key and single point of failure
Pre-allocate ID ranges to application servers so each node generates codes independently without coordination
If supporting custom aliases, validate uniqueness with a conditional write (putIfAbsent) to the datastore

2. Read-Heavy Redirect Optimization

Redirect traffic vastly exceeds write traffic, often by 100:1 or more. Serving every redirect from the primary database wastes resources and cannot meet latency targets at scale.

Hints to consider:

Place a distributed cache (Redis) in front of the datastore to serve the vast majority of redirects from memory
Deploy caches and redirect servers at the edge (multiple regions or CDN) to minimize network round-trip time
Use appropriate HTTP status codes (301 for permanent, 302 for trackable) based on whether you need to capture every click
Implement cache warming for newly created links and cache invalidation on link updates or deletions

3. Analytics Without Blocking Redirects

Recording click metadata (timestamp, referrer, country, device) on every redirect must not add latency to the user-facing path. Interviewers look for asynchronous, decoupled analytics ingestion.

Hints to consider:

Fire-and-forget click events to a message queue (Kafka) after returning the redirect response to the user
Use stream processing to aggregate click events into time-bucketed counters and dimension breakdowns
Store raw click events in a columnar or time-series database for detailed queries, with rollups for dashboard summaries
Accept eventual consistency for analytics data, showing counts that converge within minutes

4. Abuse Prevention and Link Safety

URL shorteners are magnets for spam and phishing because they obscure the destination. Interviewers expect proactive measures to prevent misuse without degrading the creation experience.

Hints to consider:

Scan destination URLs against blocklists and safe-browsing APIs at creation time
Implement rate limiting per user and per IP address to prevent bulk link creation by bots
Design a reporting and takedown workflow where flagged links are disabled pending review
Consider expiration policies and periodic sweeps to remove orphaned or suspicious links

Suggested Approach

Step 1: Clarify Requirements

Confirm the expected read-to-write ratio and total link volume. Ask whether custom aliases are required and if links have default expiration. Clarify analytics depth: just total clicks, or full breakdown by time, geography, and referrer. Establish whether multi-region deployment is needed for global latency targets. Verify authentication requirements for link management.

Step 2: High-Level Architecture

Sketch core components: a Link Creation Service that generates short codes and writes to the datastore, a Redirect Service that resolves codes and returns HTTP redirects from cache, a Link Management API for CRUD operations on user links, and an Analytics Pipeline that consumes click events from Kafka, aggregates them, and stores results for dashboard queries. Use DynamoDB or a similar key-value store for the code-to-URL mapping (high availability, predictable latency, conditional writes for uniqueness). Place Redis in front for redirect caching. Use a CDN or edge deployment for the redirect service to minimize latency globally.

Step 3: Deep Dive on Code Generation and Redirect

Walk through the creation flow. The user submits a long URL. The Link Creation Service claims the next ID from its pre-allocated range (each server is assigned a non-overlapping block of 10,000 IDs from a coordination service). It base-62 encodes the ID to produce the short code, writes the mapping to DynamoDB with a conditional check to prevent overwriting, and populates the Redis cache. For redirects, the Redirect Service receives a request for the short code, looks up Redis first, and on a hit returns an HTTP 302 with the destination URL. On a cache miss, it reads from DynamoDB, populates the cache, and redirects. After returning the response, it asynchronously publishes a click event to Kafka with the short code, timestamp, referrer, user agent, and IP-derived location.

Step 4: Address Secondary Concerns

Cover analytics processing: Kafka consumers aggregate click events into per-link counters in a time-series store, producing hourly and daily rollups for dashboard queries. Discuss cache invalidation: when a user updates or deletes a link, the management API invalidates the Redis entry and publishes an event for edge cache purging. Address multi-region: use DynamoDB global tables for active-active replication with regional Redis caches to serve redirects locally. Explain monitoring: track cache hit rate, redirect latency percentiles, creation throughput, and Kafka consumer lag. Touch on disaster recovery: DynamoDB handles replication and backup natively; Redis is ephemeral and can be rebuilt from the datastore on failure.

Related Learning Resources

Design Bitly -- complete reference answer covering URL shortener architecture, code generation strategies, and analytics pipelines
Databases -- key-value store selection, conditional writes, and global replication
Caching -- distributed cache deployment, TTL strategies, and cache invalidation patterns

Problem Statement

Key Requirements

Functional

Link creation -- users submit a long URL and receive a unique short code (e.g., sho.rt/Ab3Cd) with optional custom aliases and expiration dates
Redirect -- visitors hitting a short URL are redirected to the original destination with minimal latency
Link management -- authenticated users view, disable, delete, and optionally update the destination of their short links through a dashboard
Analytics -- track total clicks, click time series, referrer sources, and geographic distribution for each short link

Non-Functional

Scalability -- support 100 million stored URLs with 10,000 redirects per second at steady state and burst capacity to 100,000 per second
Reliability -- guarantee 99.99 percent uptime for the redirect path; tolerate datacenter failures without data loss
Latency -- serve redirects in under 10 milliseconds at p99 from cache, under 50 milliseconds on cache miss
Consistency -- newly created links must be resolvable within 1 second; eventual consistency acceptable for analytics and dashboard views

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Short Code Generation Without Collisions

Hints to consider:

Use base-62 encoding of a unique integer ID to produce compact alphanumeric codes (6-7 characters support billions of URLs)
Avoid a single auto-increment counter as it creates a hot primary key and single point of failure
Pre-allocate ID ranges to application servers so each node generates codes independently without coordination
If supporting custom aliases, validate uniqueness with a conditional write (putIfAbsent) to the datastore

2. Read-Heavy Redirect Optimization

Redirect traffic vastly exceeds write traffic, often by 100:1 or more. Serving every redirect from the primary database wastes resources and cannot meet latency targets at scale.

Hints to consider:

Place a distributed cache (Redis) in front of the datastore to serve the vast majority of redirects from memory
Deploy caches and redirect servers at the edge (multiple regions or CDN) to minimize network round-trip time
Use appropriate HTTP status codes (301 for permanent, 302 for trackable) based on whether you need to capture every click
Implement cache warming for newly created links and cache invalidation on link updates or deletions

3. Analytics Without Blocking Redirects

Recording click metadata (timestamp, referrer, country, device) on every redirect must not add latency to the user-facing path. Interviewers look for asynchronous, decoupled analytics ingestion.

Hints to consider:

Fire-and-forget click events to a message queue (Kafka) after returning the redirect response to the user
Use stream processing to aggregate click events into time-bucketed counters and dimension breakdowns
Store raw click events in a columnar or time-series database for detailed queries, with rollups for dashboard summaries
Accept eventual consistency for analytics data, showing counts that converge within minutes

4. Abuse Prevention and Link Safety

URL shorteners are magnets for spam and phishing because they obscure the destination. Interviewers expect proactive measures to prevent misuse without degrading the creation experience.

Hints to consider:

Scan destination URLs against blocklists and safe-browsing APIs at creation time
Implement rate limiting per user and per IP address to prevent bulk link creation by bots
Design a reporting and takedown workflow where flagged links are disabled pending review
Consider expiration policies and periodic sweeps to remove orphaned or suspicious links

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Code Generation and Redirect

Step 4: Address Secondary Concerns

Related Learning Resources

Design Bitly -- complete reference answer covering URL shortener architecture, code generation strategies, and analytics pipelines
Databases -- key-value store selection, conditional writes, and global replication
Caching -- distributed cache deployment, TTL strategies, and cache invalidation patterns