Practice/Amazon/Design AWS Lambda
Design AWS Lambda
System DesignMust
Problem Statement
Design a serverless compute service like AWS Lambda that executes code in response to events without developers managing servers. Users upload functions, set triggers (HTTP requests, message queues, storage events), and the platform spins up isolated workers to execute code on demand, scaling automatically. Focus on worker node architecture and handling both synchronous and asynchronous execution requests.
Interviewers ask this to test your ability to design a multi-tenant, event-driven execution platform under strict latency, security, and reliability constraints. They want to see how you reason about worker node architecture (sandboxed code execution, cold starts, pooling), control-plane vs data-plane separation, concurrency scaling, and differences between synchronous and asynchronous invocation paths.
Key Requirements
Functional
- Function deployment -- users deploy versioned functions with configuration (runtime, memory/CPU, timeouts, environment variables)
- Synchronous invocation -- users invoke functions synchronously and wait for results with strict latency requirements
- Asynchronous invocation -- users fire-and-forget with durable processing, automatic retries, and dead-letter queues
- Auto-scaling -- executions scale up and down automatically within per-function and per-account concurrency limits with fair isolation
Non-Functional
- Scalability -- support 100K+ active functions per region with individual functions handling 10K+ concurrent executions
- Reliability -- achieve 99.99% availability for the execution platform with durable queuing for async invocations
- Latency -- P99 under 100ms for warm synchronous invocations, async request acceptance within 10ms
- Consistency -- at-least-once execution semantics for async events with optional idempotency; strong isolation between tenant workloads
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Worker Node Architecture and Isolation
Executing untrusted user code safely in a multi-tenant environment while minimizing cold start latency is the core challenge.
Hints to consider:
- Use lightweight virtualization (microVMs or gVisor-sandboxed containers) for strong isolation with fast startup times
- Maintain warm worker pools for frequently invoked functions, with predictive pre-warming based on invocation patterns
- Enforce resource limits using cgroups or hypervisor controls for CPU, memory, and network, with hard timeout enforcement
- Implement worker lifecycle management: idle workers are recycled after a configurable period, freeing resources while reducing cold starts
2. Synchronous vs Asynchronous Execution Paths
These two modes have fundamentally different requirements and should use separate infrastructure to avoid cross-contamination of SLAs.
Hints to consider:
- Route synchronous invocations directly to warm workers through a load balancer, with circuit breakers to reject overflow
- Buffer asynchronous invocations in durable queues (Kafka or SQS), with worker pools pulling from queues at controlled rates
- Implement retry policies with exponential backoff for async failures, with idempotency key tracking to prevent duplicate processing
- Use dead-letter queues for invocations that exhaust retry budgets, with alerting and visibility for debugging
3. Control Plane vs Data Plane Separation
The control plane manages function metadata and scaling decisions without being in the critical path for every invocation.
Hints to consider:
- Store function metadata, versions, and configuration in a low-latency database (DynamoDB), cached at worker fleet nodes
- Propagate configuration changes asynchronously to worker fleets without requiring restarts or redeployment
- Implement concurrency limit enforcement using distributed counters with per-function and per-account granularity
- Design the control plane to handle function deployments, version management, and rollback independently from execution
4. Contention, Scaling, and Fair Scheduling
Hot functions and bursty workloads create contention for worker capacity. The system must scale rapidly while preventing noisy neighbor problems.
Hints to consider:
- Implement admission control that throttles or rejects requests when capacity is exhausted, with clear backpressure signals
- Use fair scheduling that prevents a single hot function from consuming all workers, potentially using quota-based systems
- Design scale-up triggers based on queue depth, request arrival rate, and observed latency, with hysteresis for scale-down
- Provide provisioned concurrency as an option for latency-sensitive functions that cannot tolerate cold starts
Suggested Approach
Step 1: Clarify Requirements
Begin by establishing scope with the interviewer. Confirm whether to focus on the full lifecycle or primarily on the execution runtime. Ask about multi-tenancy requirements: is strong isolation (separate VMs) or weaker isolation (shared kernel) acceptable? Clarify scale expectations and whether long-running tasks (minutes) or only short executions (seconds) are supported. Determine non-functional priorities: is cold start latency or cost efficiency more important?