Practice/MongoDB/Design an API Gateway

Design an API Gateway

System DesignMust

Problem Statement

Design an API gateway system that enables organizations to modernize their monolithic architecture by providing request routing, security, scalability, and performance optimization features. The system should support breaking away from legacy infrastructure while managing multiple APIs efficiently, acting as the single entry point for all external and internal traffic.

An API gateway sits in front of your backend services and handles cross-cutting concerns: routing requests to the correct service based on host, path, or headers; authenticating and authorizing callers; enforcing rate limits and quotas; transforming requests and responses; and collecting metrics and traces. Think of products like Kong, AWS API Gateway, Apigee, or Envoy-based service meshes. The gateway decouples clients from the internal service topology, letting teams evolve, split, and deploy services independently behind a stable external contract.

At MongoDB, a staff-level candidate was asked to propose features and a system design for an API gateway focused on migrating away from a monolith. The interviewer emphasized request routing, security, scalability, and performance. Expect deep questions about separating the control plane from the data plane, designing for zero-trust networking, and managing safe traffic migration from legacy services to new microservices.

Key Requirements

Functional

Request routing -- route incoming requests to backend services based on configurable rules matching host, path, headers, or query parameters, with support for versioned API endpoints
Authentication and authorization -- validate API keys, OAuth2 tokens, and JWTs at the edge; enforce per-endpoint authorization policies before requests reach backend services
Rate limiting and quotas -- enforce per-tenant, per-API-key, or per-endpoint rate limits using token bucket or sliding window algorithms with configurable thresholds
Traffic management -- support canary releases, blue-green deployments, weighted routing, and traffic shadowing for safe rollout of new service versions

Non-Functional

Latency -- add no more than 5-10ms of overhead per request at p99 for the gateway processing pipeline (TLS termination, auth, routing, transformation)
Scalability -- handle hundreds of thousands of requests per second with horizontal scaling of stateless data plane nodes
Availability -- maintain 99.99 percent uptime for the data plane; tolerate control plane outages without disrupting live traffic
Observability -- emit structured access logs, distributed traces, and per-route metrics for every request passing through the gateway

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Control Plane vs. Data Plane Separation

This is the foundational architectural decision. Interviewers expect you to clearly separate the configuration management layer from the request processing layer and explain why.

Hints to consider:

The data plane is the hot path: it processes every request with sub-millisecond overhead per middleware step, running entirely from local in-memory configuration
The control plane manages route definitions, security policies, and rate limit configurations through an admin API or UI, pushing updates to data plane nodes asynchronously
Data plane nodes must continue serving traffic with their last-known configuration if the control plane goes down, ensuring zero downtime during control plane maintenance
Use a push-based configuration distribution model (control plane pushes to data plane nodes via gRPC streaming or a pub/sub channel) for near-instant propagation of routing changes

2. Security Architecture at the Edge

API gateways are the first line of defense. Interviewers probe whether you understand the full security stack from TLS termination to backend mTLS.

Hints to consider:

Terminate TLS at the gateway and establish mutual TLS (mTLS) connections to backend services, ensuring encrypted traffic even within the internal network
Validate JWTs locally using cached public keys (JWKS) to avoid a round-trip to the identity provider on every request
Implement a Web Application Firewall (WAF) layer that inspects request payloads for injection attacks, oversized bodies, and malformed inputs before routing
Separate authentication (who is calling) from authorization (what they can do) so policy changes do not require redeploying the gateway

3. Rate Limiting at Scale

Distributed rate limiting across multiple gateway nodes is a classic interview topic. Interviewers want to see how you enforce limits accurately without introducing a synchronous dependency on every request.

Hints to consider:

Use Redis with atomic INCR and EXPIRE commands for a centralized token bucket that all gateway nodes check against, accepting the small network latency cost
For lower-latency enforcement, implement a local token bucket per node with periodic synchronization to a central counter, accepting approximate enforcement during sync gaps
Design quota hierarchies: global rate limits, per-tenant limits, and per-API-key limits, evaluated in order with the most restrictive applying
Return standard HTTP 429 responses with Retry-After headers so well-behaved clients back off automatically

4. Safe Migration from Monolith to Microservices

This was explicitly called out in the MongoDB interview. Interviewers want to see a concrete plan for incrementally routing traffic from legacy endpoints to new services without downtime.

Hints to consider:

Use path-based routing rules to direct specific API paths to new microservices while leaving the rest pointing to the monolith
Implement traffic shadowing first: duplicate a percentage of live requests to the new service (discarding responses) to validate behavior without affecting users
Graduate from shadowing to canary routing: send 1-5 percent of real traffic to the new service, compare error rates and latencies, and gradually increase the percentage
Maintain a rollback path at all times by keeping the monolith endpoint active and ready to receive 100 percent of traffic if the new service degrades

Suggested Approach

Step 1: Clarify Requirements

Confirm the scope with the interviewer. Ask how many distinct APIs and routes the gateway must manage (dozens or thousands). Clarify the expected request volume (thousands or hundreds of thousands per second). Determine whether the gateway serves external public APIs, internal service-to-service traffic, or both. Ask about multi-tenancy requirements: does each tenant get isolated configuration, or is this a single-organization gateway? Confirm which authentication mechanisms are required (API keys, OAuth2, mTLS). Establish whether request/response transformation (header injection, body rewriting) is in scope.

Practice/MongoDB/Design an API Gateway

Design an API Gateway

System DesignMust

Problem Statement

Key Requirements

Functional

Request routing -- route incoming requests to backend services based on configurable rules matching host, path, headers, or query parameters, with support for versioned API endpoints
Authentication and authorization -- validate API keys, OAuth2 tokens, and JWTs at the edge; enforce per-endpoint authorization policies before requests reach backend services
Rate limiting and quotas -- enforce per-tenant, per-API-key, or per-endpoint rate limits using token bucket or sliding window algorithms with configurable thresholds
Traffic management -- support canary releases, blue-green deployments, weighted routing, and traffic shadowing for safe rollout of new service versions

Non-Functional

Latency -- add no more than 5-10ms of overhead per request at p99 for the gateway processing pipeline (TLS termination, auth, routing, transformation)
Scalability -- handle hundreds of thousands of requests per second with horizontal scaling of stateless data plane nodes
Availability -- maintain 99.99 percent uptime for the data plane; tolerate control plane outages without disrupting live traffic
Observability -- emit structured access logs, distributed traces, and per-route metrics for every request passing through the gateway

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Control Plane vs. Data Plane Separation

This is the foundational architectural decision. Interviewers expect you to clearly separate the configuration management layer from the request processing layer and explain why.

Hints to consider:

The data plane is the hot path: it processes every request with sub-millisecond overhead per middleware step, running entirely from local in-memory configuration
The control plane manages route definitions, security policies, and rate limit configurations through an admin API or UI, pushing updates to data plane nodes asynchronously
Data plane nodes must continue serving traffic with their last-known configuration if the control plane goes down, ensuring zero downtime during control plane maintenance
Use a push-based configuration distribution model (control plane pushes to data plane nodes via gRPC streaming or a pub/sub channel) for near-instant propagation of routing changes

2. Security Architecture at the Edge

API gateways are the first line of defense. Interviewers probe whether you understand the full security stack from TLS termination to backend mTLS.

Hints to consider:

Terminate TLS at the gateway and establish mutual TLS (mTLS) connections to backend services, ensuring encrypted traffic even within the internal network
Validate JWTs locally using cached public keys (JWKS) to avoid a round-trip to the identity provider on every request
Implement a Web Application Firewall (WAF) layer that inspects request payloads for injection attacks, oversized bodies, and malformed inputs before routing
Separate authentication (who is calling) from authorization (what they can do) so policy changes do not require redeploying the gateway

3. Rate Limiting at Scale

Hints to consider:

Use Redis with atomic INCR and EXPIRE commands for a centralized token bucket that all gateway nodes check against, accepting the small network latency cost
For lower-latency enforcement, implement a local token bucket per node with periodic synchronization to a central counter, accepting approximate enforcement during sync gaps
Design quota hierarchies: global rate limits, per-tenant limits, and per-API-key limits, evaluated in order with the most restrictive applying
Return standard HTTP 429 responses with Retry-After headers so well-behaved clients back off automatically

4. Safe Migration from Monolith to Microservices

This was explicitly called out in the MongoDB interview. Interviewers want to see a concrete plan for incrementally routing traffic from legacy endpoints to new services without downtime.

Hints to consider:

Use path-based routing rules to direct specific API paths to new microservices while leaving the rest pointing to the monolith
Implement traffic shadowing first: duplicate a percentage of live requests to the new service (discarding responses) to validate behavior without affecting users
Graduate from shadowing to canary routing: send 1-5 percent of real traffic to the new service, compare error rates and latencies, and gradually increase the percentage
Maintain a rollback path at all times by keeping the monolith endpoint active and ready to receive 100 percent of traffic if the new service degrades