Practice/Meta/Design a Resource Loader in a Game Engine

Design a Resource Loader in a Game Engine

System DesignMust

Problem Statement

Design a scalable content delivery platform that serves static assets (images, videos, documents) to millions of users worldwide. The system must support uploads from content creators, automatic format optimization and transformation, intelligent CDN routing, and real-time analytics on delivery performance. Assets range from small thumbnails (10KB) to large video files (several GB), and the platform must handle traffic spikes during major events while maintaining sub-second latency for popular content.

Your design should account for geographic distribution of users, varying network conditions, cost optimization across multiple CDN providers, and the ability to purge or update content globally within minutes. The platform needs to support both public assets (available to anyone) and private assets (requiring authentication and time-limited access).

Key Requirements

Functional

Asset Upload and Storage -- creators upload files of varying sizes; system stores originals durably and generates optimized variants (thumbnails, different resolutions, format conversions)
Global Delivery -- users worldwide fetch assets with minimal latency; system routes requests to optimal edge locations and handles failover between CDN providers
Access Control -- support both public assets (anyone can access) and private assets (require authentication, signed URLs with expiration, usage tracking)
Cache Management -- automatically cache popular content at edge locations; provide APIs to invalidate or update specific assets globally within minutes
Analytics and Monitoring -- track delivery metrics (bandwidth, cache hit rates, error rates, latency percentiles) per region and per asset

Non-Functional

Scalability -- handle 100K requests per second globally with ability to burst to 500K during peak events; store petabytes of content
Reliability -- 99.95% availability for reads; automatic failover between CDN providers; no single point of failure in upload path
Latency -- p95 latency under 200ms for cached assets; p99 under 500ms; upload processing (creating variants) completes within 30 seconds for images, 5 minutes for videos
Consistency -- eventual consistency acceptable for edge caches (up to 5 minutes); strong consistency for access control decisions; cache invalidation propagates within 2 minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Multi-Tier Caching Strategy

The caching hierarchy is critical for balancing latency, cost, and consistency. Interviewers want to see how you layer CDN edge caches, regional mid-tier caches, and origin storage while managing cache invalidation and stampede protection.

Hints to consider:

Design a tiered approach with CDN edge nodes (minutes TTL), regional PoPs (hours TTL), and durable origin storage
Implement cache key design that includes version identifiers or content hashes to avoid stale data
Use request coalescing to prevent thundering herd when cache entries expire -- only one request should fetch from origin
Consider tradeoffs between push-based invalidation (expensive but fast) versus pull-based with short TTLs

2. Upload Pipeline and Asset Processing

Transforming uploaded assets into multiple formats and resolutions is computationally expensive. Interviewers look for asynchronous processing, work queue design, and handling of failures during transformation.

Hints to consider:

Use resumable uploads with chunking for large files to handle network interruptions
Queue transformation jobs asynchronously (message queue or task system) so uploads complete quickly
Implement priority queuing where thumbnail generation happens before full video transcoding
Design idempotent transformation workers that can retry safely; store processing state to resume on failure
Consider distributed processing for video transcoding using worker pools scaled by queue depth

3. Cost Optimization Across CDN Providers

Operating at scale means significant CDN costs. Interviewers expect you to discuss intelligent routing, multi-CDN strategies, and cost/performance tradeoffs.

Hints to consider:

Route requests based on real-time metrics (latency, cost per GB) rather than static configuration
Use cheaper CDN providers for cold/archive content; premium providers for hot popular assets
Implement tiered storage (SSD for hot data, HDD for warm, glacier for cold) based on access patterns
Batch cache invalidation requests to reduce API costs; use versioned URLs to avoid purging when possible
Monitor bandwidth costs per region and shift traffic dynamically during price changes

4. Access Control and Security for Private Assets

Serving private content requires authentication without adding latency or creating bottlenecks. Interviewers probe how you validate access without hitting a central auth service on every request.

Hints to consider:

Generate signed URLs with HMAC tokens that CDN edge nodes can validate without calling origin
Include expiration timestamps and scope (specific asset or prefix) in signed tokens
Use short-lived tokens (minutes to hours) and rotate signing keys periodically
Implement rate limiting per user/IP at edge to prevent abuse of signed URLs
Consider token revocation challenges -- accept eventual consistency or maintain hot revocation lists at edges

5. Monitoring and Incident Response

At scale, partial failures are normal. Interviewers want to see how you detect issues, route around failures, and maintain visibility into system health.

Hints to consider:

Emit metrics at multiple layers (CDN edge, regional aggregators, origin) with different retention policies
Use synthetic monitoring to proactively test delivery from various regions before users are impacted
Implement circuit breakers for CDN providers -- automatically route traffic away from failing origins
Design dashboards showing cache hit rates, bandwidth by region, error rates, and p99 latency with automatic anomaly detection
Plan for debugging: include request IDs that trace through all tiers and correlation with logs

Practice/Meta/Design a Resource Loader in a Game Engine

Design a Resource Loader in a Game Engine

System DesignMust

Problem Statement

Key Requirements

Functional

Asset Upload and Storage -- creators upload files of varying sizes; system stores originals durably and generates optimized variants (thumbnails, different resolutions, format conversions)
Global Delivery -- users worldwide fetch assets with minimal latency; system routes requests to optimal edge locations and handles failover between CDN providers
Access Control -- support both public assets (anyone can access) and private assets (require authentication, signed URLs with expiration, usage tracking)
Cache Management -- automatically cache popular content at edge locations; provide APIs to invalidate or update specific assets globally within minutes
Analytics and Monitoring -- track delivery metrics (bandwidth, cache hit rates, error rates, latency percentiles) per region and per asset

Non-Functional

Scalability -- handle 100K requests per second globally with ability to burst to 500K during peak events; store petabytes of content
Reliability -- 99.95% availability for reads; automatic failover between CDN providers; no single point of failure in upload path
Latency -- p95 latency under 200ms for cached assets; p99 under 500ms; upload processing (creating variants) completes within 30 seconds for images, 5 minutes for videos
Consistency -- eventual consistency acceptable for edge caches (up to 5 minutes); strong consistency for access control decisions; cache invalidation propagates within 2 minutes

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Multi-Tier Caching Strategy

Hints to consider:

Design a tiered approach with CDN edge nodes (minutes TTL), regional PoPs (hours TTL), and durable origin storage
Implement cache key design that includes version identifiers or content hashes to avoid stale data
Use request coalescing to prevent thundering herd when cache entries expire -- only one request should fetch from origin
Consider tradeoffs between push-based invalidation (expensive but fast) versus pull-based with short TTLs

2. Upload Pipeline and Asset Processing

Hints to consider:

Use resumable uploads with chunking for large files to handle network interruptions
Queue transformation jobs asynchronously (message queue or task system) so uploads complete quickly
Implement priority queuing where thumbnail generation happens before full video transcoding
Design idempotent transformation workers that can retry safely; store processing state to resume on failure
Consider distributed processing for video transcoding using worker pools scaled by queue depth

3. Cost Optimization Across CDN Providers

Operating at scale means significant CDN costs. Interviewers expect you to discuss intelligent routing, multi-CDN strategies, and cost/performance tradeoffs.

Hints to consider:

Route requests based on real-time metrics (latency, cost per GB) rather than static configuration
Use cheaper CDN providers for cold/archive content; premium providers for hot popular assets
Implement tiered storage (SSD for hot data, HDD for warm, glacier for cold) based on access patterns
Batch cache invalidation requests to reduce API costs; use versioned URLs to avoid purging when possible
Monitor bandwidth costs per region and shift traffic dynamically during price changes

4. Access Control and Security for Private Assets

Serving private content requires authentication without adding latency or creating bottlenecks. Interviewers probe how you validate access without hitting a central auth service on every request.

Hints to consider:

Generate signed URLs with HMAC tokens that CDN edge nodes can validate without calling origin
Include expiration timestamps and scope (specific asset or prefix) in signed tokens
Use short-lived tokens (minutes to hours) and rotate signing keys periodically
Implement rate limiting per user/IP at edge to prevent abuse of signed URLs
Consider token revocation challenges -- accept eventual consistency or maintain hot revocation lists at edges

5. Monitoring and Incident Response

At scale, partial failures are normal. Interviewers want to see how you detect issues, route around failures, and maintain visibility into system health.

Hints to consider:

Emit metrics at multiple layers (CDN edge, regional aggregators, origin) with different retention policies
Use synthetic monitoring to proactively test delivery from various regions before users are impacted
Implement circuit breakers for CDN providers -- automatically route traffic away from failing origins
Design dashboards showing cache hit rates, bandwidth by region, error rates, and p99 latency with automatic anomaly detection
Plan for debugging: include request IDs that trace through all tiers and correlation with logs