Practice/Meta/Design a Resource Loader in a Game Engine
Design a Resource Loader in a Game Engine
System DesignMust
Problem Statement
Design a scalable content delivery platform that serves static assets (images, videos, documents) to millions of users worldwide. The system must support uploads from content creators, automatic format optimization and transformation, intelligent CDN routing, and real-time analytics on delivery performance. Assets range from small thumbnails (10KB) to large video files (several GB), and the platform must handle traffic spikes during major events while maintaining sub-second latency for popular content.
Your design should account for geographic distribution of users, varying network conditions, cost optimization across multiple CDN providers, and the ability to purge or update content globally within minutes. The platform needs to support both public assets (available to anyone) and private assets (requiring authentication and time-limited access).
Key Requirements
Functional
- Asset Upload and Storage -- creators upload files of varying sizes; system stores originals durably and generates optimized variants (thumbnails, different resolutions, format conversions)
- Global Delivery -- users worldwide fetch assets with minimal latency; system routes requests to optimal edge locations and handles failover between CDN providers
- Access Control -- support both public assets (anyone can access) and private assets (require authentication, signed URLs with expiration, usage tracking)
- Cache Management -- automatically cache popular content at edge locations; provide APIs to invalidate or update specific assets globally within minutes
- Analytics and Monitoring -- track delivery metrics (bandwidth, cache hit rates, error rates, latency percentiles) per region and per asset
Non-Functional
- Scalability -- handle 100K requests per second globally with ability to burst to 500K during peak events; store petabytes of content
- Reliability -- 99.95% availability for reads; automatic failover between CDN providers; no single point of failure in upload path
- Latency -- p95 latency under 200ms for cached assets; p99 under 500ms; upload processing (creating variants) completes within 30 seconds for images, 5 minutes for videos
- Consistency -- eventual consistency acceptable for edge caches (up to 5 minutes); strong consistency for access control decisions; cache invalidation propagates within 2 minutes
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Multi-Tier Caching Strategy
The caching hierarchy is critical for balancing latency, cost, and consistency. Interviewers want to see how you layer CDN edge caches, regional mid-tier caches, and origin storage while managing cache invalidation and stampede protection.
Hints to consider:
- Design a tiered approach with CDN edge nodes (minutes TTL), regional PoPs (hours TTL), and durable origin storage
- Implement cache key design that includes version identifiers or content hashes to avoid stale data
- Use request coalescing to prevent thundering herd when cache entries expire -- only one request should fetch from origin
- Consider tradeoffs between push-based invalidation (expensive but fast) versus pull-based with short TTLs
2. Upload Pipeline and Asset Processing
Transforming uploaded assets into multiple formats and resolutions is computationally expensive. Interviewers look for asynchronous processing, work queue design, and handling of failures during transformation.
Hints to consider:
- Use resumable uploads with chunking for large files to handle network interruptions
- Queue transformation jobs asynchronously (message queue or task system) so uploads complete quickly
- Implement priority queuing where thumbnail generation happens before full video transcoding
- Design idempotent transformation workers that can retry safely; store processing state to resume on failure
- Consider distributed processing for video transcoding using worker pools scaled by queue depth
3. Cost Optimization Across CDN Providers
Operating at scale means significant CDN costs. Interviewers expect you to discuss intelligent routing, multi-CDN strategies, and cost/performance tradeoffs.
Hints to consider:
- Route requests based on real-time metrics (latency, cost per GB) rather than static configuration
- Use cheaper CDN providers for cold/archive content; premium providers for hot popular assets
- Implement tiered storage (SSD for hot data, HDD for warm, glacier for cold) based on access patterns
- Batch cache invalidation requests to reduce API costs; use versioned URLs to avoid purging when possible
- Monitor bandwidth costs per region and shift traffic dynamically during price changes
4. Access Control and Security for Private Assets
Serving private content requires authentication without adding latency or creating bottlenecks. Interviewers probe how you validate access without hitting a central auth service on every request.
Hints to consider:
- Generate signed URLs with HMAC tokens that CDN edge nodes can validate without calling origin
- Include expiration timestamps and scope (specific asset or prefix) in signed tokens
- Use short-lived tokens (minutes to hours) and rotate signing keys periodically
- Implement rate limiting per user/IP at edge to prevent abuse of signed URLs
- Consider token revocation challenges -- accept eventual consistency or maintain hot revocation lists at edges
5. Monitoring and Incident Response
At scale, partial failures are normal. Interviewers want to see how you detect issues, route around failures, and maintain visibility into system health.
Hints to consider:
- Emit metrics at multiple layers (CDN edge, regional aggregators, origin) with different retention policies
- Use synthetic monitoring to proactively test delivery from various regions before users are impacted
- Implement circuit breakers for CDN providers -- automatically route traffic away from failing origins
- Design dashboards showing cache hit rates, bandwidth by region, error rates, and p99 latency with automatic anomaly detection
- Plan for debugging: include request IDs that trace through all tiers and correlation with logs