Practice/Meta/Design a File Downloader Library
Design a File Downloader Library
System DesignOptional
Problem Statement
Design a distributed video processing pipeline that ingests raw video files, applies transformations (transcoding, thumbnail generation, watermarking), and produces multiple output formats optimized for different devices and network conditions. The system should handle videos ranging from short social media clips (30 seconds) to full-length movies (2+ hours), processing thousands of uploads per hour during peak times.
Your pipeline must ensure reliable processing even when individual workers fail, provide visibility into processing status, and optimize for cost-efficiency by intelligently managing compute resources. The system should support adding new transformation steps without disrupting existing workflows and handle priority processing for premium users.
Key Requirements
Functional
- Video ingestion and validation -- accept uploads via API, validate format/codec/size constraints, reject invalid files early
- Multi-format transcoding -- convert videos to multiple resolutions (4K, 1080p, 720p, 480p) and codecs (H.264, H.265, VP9) based on configurable profiles
- Metadata extraction and thumbnail generation -- extract video metadata (duration, bitrate, codec info) and generate thumbnails at multiple timestamps
- Progress tracking and notifications -- provide real-time status updates on processing stages and notify users/systems when jobs complete or fail
- Retry and error handling -- automatically retry transient failures with exponential backoff, handle partial failures gracefully, and provide detailed error diagnostics
Non-Functional
- Scalability -- process 5,000 video uploads per hour during peak, with individual videos ranging from 50MB to 10GB; scale worker capacity dynamically based on queue depth
- Reliability -- ensure 99.9% successful processing rate with no data loss; tolerate worker failures mid-processing without restarting entire jobs
- Latency -- complete processing within 10 minutes for standard videos under 1GB; support priority queue for premium content requiring sub-5-minute turnaround
- Cost efficiency -- minimize compute costs by batching work, using spot instances where appropriate, and avoiding redundant processing steps
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Task Orchestration and Fault Tolerance
How you coordinate multi-stage processing across distributed workers while ensuring exactly-once semantics and graceful failure recovery. Interviewers want to see if you understand idempotency, checkpointing, and dead letter queues.
Hints to consider:
- Use a workflow orchestration pattern with persistent state for each processing stage
- Design each transformation step to be idempotent by writing outputs to temporary locations with atomic rename
- Implement checkpointing after expensive operations like transcoding to avoid reprocessing on retries
- Consider circuit breakers to isolate failures in specific workers or transformation types
2. Work Distribution and Resource Management
How you partition work across heterogeneous workers, balance load, and scale capacity to meet SLAs while controlling costs. Strong candidates discuss queue-based decoupling, work stealing, and auto-scaling strategies.
Hints to consider:
- Separate queues by priority (premium vs standard) and job characteristics (short vs long videos)
- Use message visibility timeouts and heartbeats to detect stuck workers
- Implement worker specialization for CPU-intensive (transcoding) vs I/O-bound (upload/download) tasks
- Discuss spot instance strategies for batch workloads with graceful job migration on preemption
3. Data Flow and Storage Strategy
How you move large video files through the pipeline efficiently while managing storage costs and ensuring data durability. This reveals your understanding of blob storage, network bandwidth optimization, and lifecycle policies.
Hints to consider:
- Stage raw and processed videos in object storage with different retention policies
- Use pre-signed URLs or direct S3 transfer acceleration to avoid proxying large files through application servers
- Consider multi-part upload/download for large files with parallel streams
- Implement automatic cleanup of intermediate artifacts after successful processing
- Use content-addressed storage or checksums to deduplicate identical uploads
4. Observability and Progress Tracking
How you provide visibility into pipeline health, individual job status, and debugging information when processing fails. Interviewers look for structured logging, metrics, and event-driven status updates.
Hints to consider:
- Emit structured events at each stage transition (queued → processing → transcoding → complete)
- Track fine-grained metrics like per-stage latency, worker utilization, and error rates by failure type
- Store detailed processing metadata (input specs, transformation parameters, output locations) for debugging
- Use distributed tracing to correlate logs across multiple workers handling the same job
- Implement webhooks or WebSocket connections for real-time client progress updates