Design a URL shortening service similar to TinyURL or Bitly that allows users to convert long URLs into short, shareable links and manage their shortened URLs. A user pastes a long URL, receives a compact code like https://sho.rt/Ab3Cd, and anyone visiting that short link is immediately redirected to the original address.
The service must handle creating short links, redirecting users with sub-50ms latency at the edge, managing links through an authenticated dashboard (view, disable, delete, update destination), and delivering basic analytics including total clicks, time-series trends, referrer sources, and geographic breakdowns. While the problem appears straightforward, it exercises core distributed systems skills: globally unique ID generation without collisions, extreme read-path scaling for redirects, edge-level serving via CDN, write-heavy analytics event capture, abuse prevention, and thoughtful data modeling.
At Zscaler scale, interviewers use this question to evaluate whether you can define crisp requirements, estimate scale realistically, select the right storage and caching strategy, and make pragmatic trade-offs around availability, consistency, and cost.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The ID generation strategy determines much of the system's scalability and correctness. A naive single auto-increment counter creates contention and a single point of failure.
Hints to consider:
+ and / are not URL-safe and require encodingRedirect traffic is overwhelmingly read-heavy. Hammering the primary datastore on every redirect will miss latency targets and inflate costs.
Hints to consider:
Coupling analytics writes to the redirect path raises tail latency and creates failure correlation. An analytics outage should never break the redirect experience.
Hints to consider:
URL shorteners are frequent targets for spam, phishing, and denial-of-service attacks. Interviewers want to see you think about operational safety beyond the happy path.
Hints to consider:
The core mapping from short code to URL is a classic key-value problem, but the full data model includes user ownership, metadata, expiration, and analytics.
Hints to consider:
Confirm the expected scale: how many link creations per day, how many redirects per day, and what read-to-write ratio to design for. Ask whether custom aliases are supported (users choosing their own short code). Clarify the analytics depth: just click counts, or full breakdowns by time, geography, and referrer. Confirm whether links can expire and whether the system needs to support bulk creation via API. Ask about the geographic distribution of users to inform CDN and multi-region decisions.
Sketch the core components: an API Gateway for authentication and rate limiting, a Link Service handling creation and management, a Redirect Service optimized for speed, a Cache Layer (Redis plus CDN), a primary datastore (DynamoDB) for the code-to-URL mapping and metadata, a message queue (Kafka) for analytics event ingestion, and an Analytics Service backed by a time-series store. Show two distinct paths: the write path (create link, generate code, persist, return short URL) and the read path (CDN check, Redis check, database lookup, serve redirect, emit analytics event). Emphasize that the redirect path is the hot path and must stay as thin as possible.
Walk through the full lifecycle of a redirect request. A client hits the CDN with a short URL. If the edge has a cached redirect, it responds immediately in under 10ms. On a cache miss, the request reaches the Redirect Service, which checks Redis and then DynamoDB if needed. The mapping is served as a 302 redirect and cached at both the Redis and CDN layers for future requests. Simultaneously, a click event containing the short code, timestamp, referrer, and client geography is published to Kafka. For code generation, explain how a distributed counter with pre-allocated ranges per node (or a hash-based scheme with collision detection) avoids hotspots and single points of failure. Show how conditional writes in DynamoDB guarantee uniqueness without requiring distributed locking.
Cover the analytics pipeline: Kafka consumers aggregate click events into per-link counters and time-series buckets, stored in a columnar database for efficient dashboard queries. Discuss cache invalidation when a user disables or deletes a link: invalidate both CDN and Redis entries and serve a 404 or 410 Gone response. Address link expiration using DynamoDB TTL to trigger automatic cleanup of expired entries, followed by cache eviction. Cover monitoring: track redirect latency percentiles, cache hit rates, link creation throughput, and Kafka consumer lag. Discuss cost optimization: the vast majority of redirects should be served from CDN edge caches, minimizing origin hits and database reads to keep infrastructure costs proportional to actual creation volume rather than redirect volume.
Deepen your understanding of the patterns used in this problem: