Practice/Amazon/Design AWS Lambda

Design AWS Lambda

System DesignMust

Problem Statement

Design a serverless compute service like AWS Lambda that executes code in response to events without developers managing servers. Users upload functions, set triggers (HTTP requests, message queues, storage events), and the platform spins up isolated workers to execute code on demand, scaling automatically. Focus on worker node architecture and handling both synchronous and asynchronous execution requests.

Interviewers ask this to test your ability to design a multi-tenant, event-driven execution platform under strict latency, security, and reliability constraints. They want to see how you reason about worker node architecture (sandboxed code execution, cold starts, pooling), control-plane vs data-plane separation, concurrency scaling, and differences between synchronous and asynchronous invocation paths.

Key Requirements

Functional

Function deployment -- users deploy versioned functions with configuration (runtime, memory/CPU, timeouts, environment variables)
Synchronous invocation -- users invoke functions synchronously and wait for results with strict latency requirements
Asynchronous invocation -- users fire-and-forget with durable processing, automatic retries, and dead-letter queues
Auto-scaling -- executions scale up and down automatically within per-function and per-account concurrency limits with fair isolation

Non-Functional

Scalability -- support 100K+ active functions per region with individual functions handling 10K+ concurrent executions
Reliability -- achieve 99.99% availability for the execution platform with durable queuing for async invocations
Latency -- P99 under 100ms for warm synchronous invocations, async request acceptance within 10ms
Consistency -- at-least-once execution semantics for async events with optional idempotency; strong isolation between tenant workloads

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Worker Node Architecture and Isolation

Executing untrusted user code safely in a multi-tenant environment while minimizing cold start latency is the core challenge.

Hints to consider:

Use lightweight virtualization (microVMs or gVisor-sandboxed containers) for strong isolation with fast startup times
Maintain warm worker pools for frequently invoked functions, with predictive pre-warming based on invocation patterns
Enforce resource limits using cgroups or hypervisor controls for CPU, memory, and network, with hard timeout enforcement
Implement worker lifecycle management: idle workers are recycled after a configurable period, freeing resources while reducing cold starts

2. Synchronous vs Asynchronous Execution Paths

These two modes have fundamentally different requirements and should use separate infrastructure to avoid cross-contamination of SLAs.

Hints to consider:

Route synchronous invocations directly to warm workers through a load balancer, with circuit breakers to reject overflow
Buffer asynchronous invocations in durable queues (Kafka or SQS), with worker pools pulling from queues at controlled rates
Implement retry policies with exponential backoff for async failures, with idempotency key tracking to prevent duplicate processing
Use dead-letter queues for invocations that exhaust retry budgets, with alerting and visibility for debugging

3. Control Plane vs Data Plane Separation

The control plane manages function metadata and scaling decisions without being in the critical path for every invocation.

Hints to consider:

Store function metadata, versions, and configuration in a low-latency database (DynamoDB), cached at worker fleet nodes
Propagate configuration changes asynchronously to worker fleets without requiring restarts or redeployment
Implement concurrency limit enforcement using distributed counters with per-function and per-account granularity
Design the control plane to handle function deployments, version management, and rollback independently from execution

4. Contention, Scaling, and Fair Scheduling

Hot functions and bursty workloads create contention for worker capacity. The system must scale rapidly while preventing noisy neighbor problems.

Hints to consider:

Implement admission control that throttles or rejects requests when capacity is exhausted, with clear backpressure signals
Use fair scheduling that prevents a single hot function from consuming all workers, potentially using quota-based systems
Design scale-up triggers based on queue depth, request arrival rate, and observed latency, with hysteresis for scale-down
Provide provisioned concurrency as an option for latency-sensitive functions that cannot tolerate cold starts

Suggested Approach

Step 1: Clarify Requirements

Begin by establishing scope with the interviewer. Confirm whether to focus on the full lifecycle or primarily on the execution runtime. Ask about multi-tenancy requirements: is strong isolation (separate VMs) or weaker isolation (shared kernel) acceptable? Clarify scale expectations and whether long-running tasks (minutes) or only short executions (seconds) are supported. Determine non-functional priorities: is cold start latency or cost efficiency more important?

Practice/Amazon/Design AWS Lambda

Design AWS Lambda

System DesignMust

Problem Statement

Key Requirements

Functional

Function deployment -- users deploy versioned functions with configuration (runtime, memory/CPU, timeouts, environment variables)
Synchronous invocation -- users invoke functions synchronously and wait for results with strict latency requirements
Asynchronous invocation -- users fire-and-forget with durable processing, automatic retries, and dead-letter queues
Auto-scaling -- executions scale up and down automatically within per-function and per-account concurrency limits with fair isolation

Non-Functional

Scalability -- support 100K+ active functions per region with individual functions handling 10K+ concurrent executions
Reliability -- achieve 99.99% availability for the execution platform with durable queuing for async invocations
Latency -- P99 under 100ms for warm synchronous invocations, async request acceptance within 10ms
Consistency -- at-least-once execution semantics for async events with optional idempotency; strong isolation between tenant workloads

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Worker Node Architecture and Isolation

Executing untrusted user code safely in a multi-tenant environment while minimizing cold start latency is the core challenge.

Hints to consider:

Use lightweight virtualization (microVMs or gVisor-sandboxed containers) for strong isolation with fast startup times
Maintain warm worker pools for frequently invoked functions, with predictive pre-warming based on invocation patterns
Enforce resource limits using cgroups or hypervisor controls for CPU, memory, and network, with hard timeout enforcement
Implement worker lifecycle management: idle workers are recycled after a configurable period, freeing resources while reducing cold starts

2. Synchronous vs Asynchronous Execution Paths

These two modes have fundamentally different requirements and should use separate infrastructure to avoid cross-contamination of SLAs.

Hints to consider:

Route synchronous invocations directly to warm workers through a load balancer, with circuit breakers to reject overflow
Buffer asynchronous invocations in durable queues (Kafka or SQS), with worker pools pulling from queues at controlled rates
Implement retry policies with exponential backoff for async failures, with idempotency key tracking to prevent duplicate processing
Use dead-letter queues for invocations that exhaust retry budgets, with alerting and visibility for debugging

3. Control Plane vs Data Plane Separation

The control plane manages function metadata and scaling decisions without being in the critical path for every invocation.

Hints to consider:

Store function metadata, versions, and configuration in a low-latency database (DynamoDB), cached at worker fleet nodes
Propagate configuration changes asynchronously to worker fleets without requiring restarts or redeployment
Implement concurrency limit enforcement using distributed counters with per-function and per-account granularity
Design the control plane to handle function deployments, version management, and rollback independently from execution

4. Contention, Scaling, and Fair Scheduling

Hot functions and bursty workloads create contention for worker capacity. The system must scale rapidly while preventing noisy neighbor problems.

Hints to consider:

Implement admission control that throttles or rejects requests when capacity is exhausted, with clear backpressure signals
Use fair scheduling that prevents a single hot function from consuming all workers, potentially using quota-based systems
Design scale-up triggers based on queue depth, request arrival rate, and observed latency, with hysteresis for scale-down
Provide provisioned concurrency as an option for latency-sensitive functions that cannot tolerate cold starts