Practice/Meta/Design a Distributed Cache System
Design a Distributed Cache System
System DesignMust
Problem Statement
Design a content delivery network that serves static assets (images, videos, JavaScript bundles, CSS files) to users worldwide with minimal latency and high availability. Your CDN should handle hundreds of terabytes of content, billions of requests per day, and gracefully manage origin failures, cache invalidation, and traffic spikes during viral events.
A CDN sits between end users and origin servers, caching content at edge locations close to users. When a request arrives, the edge node serves cached content immediately or fetches it from the origin if missing. Your system must intelligently route requests, decide what to cache and for how long, handle invalidation when content changes, and maintain acceptable performance even when origins are slow or unavailable. Interviewers use this problem to assess your understanding of geographic distribution, HTTP caching semantics, content routing algorithms, cache hierarchy, and strategies for handling thundering herds and origin shielding.
Key Requirements
Functional
- Content upload and distribution -- Content providers upload assets to the CDN, which propagates them to edge locations worldwide
- Intelligent request routing -- User requests are directed to the nearest or best-performing edge node based on latency, load, and health
- Cache management -- Edge nodes cache content with configurable TTLs and eviction policies, honoring HTTP cache-control headers
- Purge and invalidation -- Content providers can instantly invalidate specific URLs or patterns across all edge locations when content updates
- Origin shielding -- The CDN protects origin servers from traffic surges by coalescing requests and maintaining a mid-tier cache layer
Non-Functional
- Scalability -- Support 100+ edge locations globally, 10 million requests per second aggregate, and 500 TB of cached content
- Reliability -- Achieve 99.99% availability with automatic failover and graceful degradation when origins or edge nodes fail
- Latency -- Deliver cached content in under 50ms for 95% of requests, with TCP connection reuse and optimized routing
- Consistency -- Propagate invalidations to all edges within 5 seconds while allowing eventual consistency for routine cache refresh
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Request Routing and Geographic Distribution
Interviewers want to see how you route users to the optimal edge node in real time. This involves DNS resolution, anycast networking, latency-based selection, and health-aware load balancing.
Hints to consider:
- Use geo-DNS to return IP addresses of nearby edge PoPs based on the user's DNS resolver location
- Implement anycast routing so multiple edge locations share the same IP prefix and network topology routes users to the closest one
- Maintain real-time latency and health metrics, allowing clients to fall back to alternate PoPs if the primary is slow or down
- Consider using HTTP redirects or connection coalescing for fine-grained routing after initial DNS resolution
2. Cache Hierarchy and Origin Shielding
A flat architecture overwhelms origins during cache misses. Interviewers expect you to design a multi-tier hierarchy that shields origins and reduces redundant fetches.
Hints to consider:
- Deploy regional aggregation nodes between edge caches and origins to deduplicate concurrent misses
- Use consistent hashing at the regional layer so the same object always routes to the same aggregation node, maximizing cache hit rate
- Implement request coalescing (single-flight pattern) so only one origin fetch occurs per cache miss, even under high concurrency
- Design backoff and circuit-breaker logic to protect origins during outages or slow response times
3. Cache Invalidation and Consistency
Stale content damages user experience and trust. Interviewers probe how you propagate invalidations globally while balancing freshness against origin load.
Hints to consider:
- Use a pub-sub message bus (Kafka, SNS) to broadcast purge events to all edge nodes with sub-second fanout
- Support purge by URL, prefix wildcard, and cache tags to give content providers flexible invalidation granularity
- Implement versioned URLs or cache busting query parameters as an alternative to purging for immutable assets
- Allow configurable stale-while-revalidate and stale-if-error policies to serve slightly outdated content during origin issues
4. Handling Hot Content and Traffic Spikes
Viral events or product launches create extreme load concentration on specific objects. Interviewers want to see proactive strategies to prevent origin meltdowns and maintain low latency.
Hints to consider:
- Pre-warm caches by pushing popular content to all edges before anticipated traffic spikes
- Use negative caching to avoid repeated origin requests for missing or error responses
- Apply rate limiting and request throttling at edges to smooth bursty traffic before it reaches origins
- Implement dynamic TTL extension for hot objects to reduce cache churn and origin fetch frequency during viral periods