Practice/Meta/Design a Distributed Cache System

Design a Distributed Cache System

System DesignMust

Problem Statement

Design a content delivery network that serves static assets (images, videos, JavaScript bundles, CSS files) to users worldwide with minimal latency and high availability. Your CDN should handle hundreds of terabytes of content, billions of requests per day, and gracefully manage origin failures, cache invalidation, and traffic spikes during viral events.

A CDN sits between end users and origin servers, caching content at edge locations close to users. When a request arrives, the edge node serves cached content immediately or fetches it from the origin if missing. Your system must intelligently route requests, decide what to cache and for how long, handle invalidation when content changes, and maintain acceptable performance even when origins are slow or unavailable. Interviewers use this problem to assess your understanding of geographic distribution, HTTP caching semantics, content routing algorithms, cache hierarchy, and strategies for handling thundering herds and origin shielding.

Key Requirements

Functional

Content upload and distribution -- Content providers upload assets to the CDN, which propagates them to edge locations worldwide
Intelligent request routing -- User requests are directed to the nearest or best-performing edge node based on latency, load, and health
Cache management -- Edge nodes cache content with configurable TTLs and eviction policies, honoring HTTP cache-control headers
Purge and invalidation -- Content providers can instantly invalidate specific URLs or patterns across all edge locations when content updates
Origin shielding -- The CDN protects origin servers from traffic surges by coalescing requests and maintaining a mid-tier cache layer

Non-Functional

Scalability -- Support 100+ edge locations globally, 10 million requests per second aggregate, and 500 TB of cached content
Reliability -- Achieve 99.99% availability with automatic failover and graceful degradation when origins or edge nodes fail
Latency -- Deliver cached content in under 50ms for 95% of requests, with TCP connection reuse and optimized routing
Consistency -- Propagate invalidations to all edges within 5 seconds while allowing eventual consistency for routine cache refresh

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Request Routing and Geographic Distribution

Interviewers want to see how you route users to the optimal edge node in real time. This involves DNS resolution, anycast networking, latency-based selection, and health-aware load balancing.

Hints to consider:

Use geo-DNS to return IP addresses of nearby edge PoPs based on the user's DNS resolver location
Implement anycast routing so multiple edge locations share the same IP prefix and network topology routes users to the closest one
Maintain real-time latency and health metrics, allowing clients to fall back to alternate PoPs if the primary is slow or down
Consider using HTTP redirects or connection coalescing for fine-grained routing after initial DNS resolution

2. Cache Hierarchy and Origin Shielding

A flat architecture overwhelms origins during cache misses. Interviewers expect you to design a multi-tier hierarchy that shields origins and reduces redundant fetches.

Hints to consider:

Deploy regional aggregation nodes between edge caches and origins to deduplicate concurrent misses
Use consistent hashing at the regional layer so the same object always routes to the same aggregation node, maximizing cache hit rate
Implement request coalescing (single-flight pattern) so only one origin fetch occurs per cache miss, even under high concurrency
Design backoff and circuit-breaker logic to protect origins during outages or slow response times

3. Cache Invalidation and Consistency

Stale content damages user experience and trust. Interviewers probe how you propagate invalidations globally while balancing freshness against origin load.

Hints to consider:

Use a pub-sub message bus (Kafka, SNS) to broadcast purge events to all edge nodes with sub-second fanout
Support purge by URL, prefix wildcard, and cache tags to give content providers flexible invalidation granularity
Implement versioned URLs or cache busting query parameters as an alternative to purging for immutable assets
Allow configurable stale-while-revalidate and stale-if-error policies to serve slightly outdated content during origin issues

4. Handling Hot Content and Traffic Spikes

Viral events or product launches create extreme load concentration on specific objects. Interviewers want to see proactive strategies to prevent origin meltdowns and maintain low latency.

Hints to consider:

Pre-warm caches by pushing popular content to all edges before anticipated traffic spikes
Use negative caching to avoid repeated origin requests for missing or error responses
Apply rate limiting and request throttling at edges to smooth bursty traffic before it reaches origins
Implement dynamic TTL extension for hot objects to reduce cache churn and origin fetch frequency during viral periods

Practice/Meta/Design a Distributed Cache System

Design a Distributed Cache System

System DesignMust

Problem Statement

Key Requirements

Functional

Content upload and distribution -- Content providers upload assets to the CDN, which propagates them to edge locations worldwide
Intelligent request routing -- User requests are directed to the nearest or best-performing edge node based on latency, load, and health
Cache management -- Edge nodes cache content with configurable TTLs and eviction policies, honoring HTTP cache-control headers
Purge and invalidation -- Content providers can instantly invalidate specific URLs or patterns across all edge locations when content updates
Origin shielding -- The CDN protects origin servers from traffic surges by coalescing requests and maintaining a mid-tier cache layer

Non-Functional

Scalability -- Support 100+ edge locations globally, 10 million requests per second aggregate, and 500 TB of cached content
Reliability -- Achieve 99.99% availability with automatic failover and graceful degradation when origins or edge nodes fail
Latency -- Deliver cached content in under 50ms for 95% of requests, with TCP connection reuse and optimized routing
Consistency -- Propagate invalidations to all edges within 5 seconds while allowing eventual consistency for routine cache refresh

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Request Routing and Geographic Distribution

Interviewers want to see how you route users to the optimal edge node in real time. This involves DNS resolution, anycast networking, latency-based selection, and health-aware load balancing.

Hints to consider:

Use geo-DNS to return IP addresses of nearby edge PoPs based on the user's DNS resolver location
Implement anycast routing so multiple edge locations share the same IP prefix and network topology routes users to the closest one
Maintain real-time latency and health metrics, allowing clients to fall back to alternate PoPs if the primary is slow or down
Consider using HTTP redirects or connection coalescing for fine-grained routing after initial DNS resolution

2. Cache Hierarchy and Origin Shielding

A flat architecture overwhelms origins during cache misses. Interviewers expect you to design a multi-tier hierarchy that shields origins and reduces redundant fetches.

Hints to consider:

Deploy regional aggregation nodes between edge caches and origins to deduplicate concurrent misses
Use consistent hashing at the regional layer so the same object always routes to the same aggregation node, maximizing cache hit rate
Implement request coalescing (single-flight pattern) so only one origin fetch occurs per cache miss, even under high concurrency
Design backoff and circuit-breaker logic to protect origins during outages or slow response times

3. Cache Invalidation and Consistency

Stale content damages user experience and trust. Interviewers probe how you propagate invalidations globally while balancing freshness against origin load.

Hints to consider:

Use a pub-sub message bus (Kafka, SNS) to broadcast purge events to all edge nodes with sub-second fanout
Support purge by URL, prefix wildcard, and cache tags to give content providers flexible invalidation granularity
Implement versioned URLs or cache busting query parameters as an alternative to purging for immutable assets
Allow configurable stale-while-revalidate and stale-if-error policies to serve slightly outdated content during origin issues

4. Handling Hot Content and Traffic Spikes

Viral events or product launches create extreme load concentration on specific objects. Interviewers want to see proactive strategies to prevent origin meltdowns and maintain low latency.

Hints to consider:

Pre-warm caches by pushing popular content to all edges before anticipated traffic spikes
Use negative caching to avoid repeated origin requests for missing or error responses
Apply rate limiting and request throttling at edges to smooth bursty traffic before it reaches origins
Implement dynamic TTL extension for hot objects to reduce cache churn and origin fetch frequency during viral periods