Design a URL shortening service similar to TinyURL or Bitly that allows users to convert long URLs into short, shareable links and manage their shortened URLs. A user pastes a long URL, gets a compact code like https://sho.rt/Ab3Cd, and anyone hitting that short link is redirected to the original address.
The system must support creating short links, redirecting users with low latency, managing links through an authenticated dashboard (view, disable, delete), and providing basic analytics (total clicks, time series, referrers, country/region). This looks deceptively simple but exercises core distributed systems skills: globally unique ID generation without collisions, extreme read scaling for redirects, low-latency edge serving, write-heavy analytics capture, abuse prevention, and solid data modeling.
At ServiceTitan scale, interviewers want to see crisp requirement definition, realistic scale estimation, the right storage and caching strategy, and pragmatic trade-offs around availability, consistency, and cost.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The core ID generation strategy determines much of the system's scalability and correctness. Naive approaches like a single auto-increment counter create contention and single points of failure.
Hints to consider:
+ and /)Redirect traffic is overwhelmingly read-heavy. Missing latency targets or hammering the primary datastore on every redirect is a critical red flag.
Hints to consider:
Coupling analytics writes to the redirect path raises tail latency and creates failure correlation. A backlog or partial outage in analytics should never break redirects.
Hints to consider:
URL shorteners are frequent targets for spam, phishing, and denial-of-service. Interviewers want to see you think about operational safety.
Hints to consider:
The core mapping from short code to URL is a classic key-value problem, but the full data model includes user ownership, metadata, and analytics.
Hints to consider:
Confirm the expected scale: how many link creations per day, how many redirects per day, and what read-to-write ratio to design for. Ask whether custom aliases are supported (users choosing their own short code). Clarify the analytics depth: just click counts, or full breakdowns by time, geography, and referrer. Confirm whether links can expire and whether the system needs to support bulk creation (API-driven). Ask about geographic distribution of users to inform CDN and multi-region decisions.
Sketch the core components: an API Gateway for authentication and rate limiting, a Link Service handling creation and management, a Redirect Service optimized for speed, a Cache Layer (Redis plus CDN), a primary datastore (DynamoDB) for the code-to-URL mapping and metadata, a message queue (Kafka) for analytics event ingestion, and an Analytics Service backed by a time-series store. Show the write path (create link, generate code, persist, return URL) and the read path (CDN check, Redis check, database lookup, redirect, emit analytics event). Highlight that the redirect path is the hot path and must be as thin as possible.
Walk through the full lifecycle of a redirect. A client hits the CDN with a short URL. If the edge has a cached redirect, it responds immediately (sub-10ms). On a cache miss, the request reaches the Redirect Service, which checks Redis, then DynamoDB if needed. The redirect is served and the mapping is cached at both layers. Simultaneously, a click event is published to Kafka. Discuss code generation in detail: show how a distributed counter or hash-based scheme avoids hotspots and single points of failure. Explain how conditional writes in DynamoDB guarantee uniqueness without distributed locking.
Cover the analytics pipeline: Kafka consumers aggregate click events into per-link counters and time-series buckets, stored in a columnar database or time-series store. Discuss cache invalidation when a user disables or deletes a link: invalidate CDN and Redis entries, and serve a 404 or gone page. Address link expiration with DynamoDB TTL triggering cleanup of cache entries. Cover monitoring: track redirect latency percentiles, cache hit rates, creation throughput, and Kafka consumer lag. Discuss cost optimization: most redirects should be served from CDN, minimizing origin hits and database reads.
Deepen your understanding of the patterns used in this problem: