Practice/Oracle/Design a Rate Limiter

Design a Rate Limiter

System DesignMust

Problem Statement

Design a distributed rate limiting system that can track API requests across multiple servers and enforce limits per user or client to prevent abuse and maintain system performance. Think of the guardrails behind API gateways at companies like Meta or OpenAI that prevent abusive bursts while keeping legitimate traffic fast.

Interviewers ask this to see if you can design low-latency, highly concurrent systems that remain correct under contention and failures. They look for your ability to reason about algorithms (token bucket, leaky bucket, sliding window), atomic updates across a fleet, multi-region tradeoffs, hot-key mitigation, and clear client signaling (429, Retry-After).

Key Requirements

Functional

Cross-server enforcement -- request limits are enforced consistently across multiple servers and regions for identities like user ID, API key, or IP
Configurable policies -- define different policies (per-endpoint limits, burst vs sustained rates, per-tenant quotas) and update them without downtime
Clear feedback -- throttled users receive appropriate status codes (429) and retry hints (Retry-After header)
Observability -- usage and throttling metrics are available in near real time to understand policy effects and detect abuse

Non-Functional

Scalability -- handle millions of rate-check decisions per second across a distributed fleet
Reliability -- fail open or closed based on configuration; avoid becoming a single point of failure for the entire API
Latency -- add less than 5ms of overhead to the request path for rate-check decisions
Consistency -- slight over-admission during transient partitions is acceptable; prefer availability over strict accuracy

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Rate Limiting Algorithm Selection

Interviewers expect you to know the tradeoffs between common algorithms and pick one with justification.

Hints to consider:

Token bucket: smooth rate limiting with configurable burst capacity; simple to implement with atomic counters
Sliding window log: precise but memory-intensive since it stores every request timestamp
Sliding window counter: approximate but memory-efficient; hybrid of fixed and sliding approaches
Discuss boundary burst problems with fixed-window counters (clients can send 2x the limit at window edges)

2. Distributed State Management

The core challenge is maintaining accurate counters across multiple servers without creating bottlenecks.

Hints to consider:

Use Redis with Lua scripting for atomic check-and-increment operations in a single round trip
Consider local counters with periodic sync to reduce round trips for high-volume endpoints
Discuss the tradeoff between accuracy (centralized counter) and latency (distributed with eventual sync)
Handle Redis unavailability: fail open (allow requests) for non-critical endpoints, fail closed for security-critical ones

3. Hot Key Handling

Popular API keys or heavily-used endpoints create contention on a single Redis key. Interviewers probe your mitigation strategies.

Hints to consider:

Shard counters for hot keys across multiple Redis keys and sum them for enforcement
Use hierarchical rate limiting: per-server local limits plus global coordination for accuracy
Implement rate limit caching on the API gateway with short TTLs to reduce Redis round trips
Pre-allocate token budgets to each server to minimize coordination during burst periods

4. Multi-Region and Edge Deployment

For global APIs, rate limiting must work across regions without cross-region latency on every request.

Hints to consider:

Deploy Redis clusters in each region with regional enforcement plus asynchronous cross-region sync
Accept that global limits will be approximate (slightly over the limit) to avoid cross-region latency
Use the API gateway as the natural enforcement point since all requests pass through it
Consider quota allocation per region based on traffic patterns, with periodic rebalancing

Practice/Oracle/Design a Rate Limiter

Design a Rate Limiter

System DesignMust

Problem Statement

Key Requirements

Functional

Cross-server enforcement -- request limits are enforced consistently across multiple servers and regions for identities like user ID, API key, or IP
Configurable policies -- define different policies (per-endpoint limits, burst vs sustained rates, per-tenant quotas) and update them without downtime
Clear feedback -- throttled users receive appropriate status codes (429) and retry hints (Retry-After header)
Observability -- usage and throttling metrics are available in near real time to understand policy effects and detect abuse

Non-Functional

Scalability -- handle millions of rate-check decisions per second across a distributed fleet
Reliability -- fail open or closed based on configuration; avoid becoming a single point of failure for the entire API
Latency -- add less than 5ms of overhead to the request path for rate-check decisions
Consistency -- slight over-admission during transient partitions is acceptable; prefer availability over strict accuracy

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Rate Limiting Algorithm Selection

Interviewers expect you to know the tradeoffs between common algorithms and pick one with justification.

Hints to consider:

Token bucket: smooth rate limiting with configurable burst capacity; simple to implement with atomic counters
Sliding window log: precise but memory-intensive since it stores every request timestamp
Sliding window counter: approximate but memory-efficient; hybrid of fixed and sliding approaches
Discuss boundary burst problems with fixed-window counters (clients can send 2x the limit at window edges)

2. Distributed State Management

The core challenge is maintaining accurate counters across multiple servers without creating bottlenecks.

Hints to consider:

Use Redis with Lua scripting for atomic check-and-increment operations in a single round trip
Consider local counters with periodic sync to reduce round trips for high-volume endpoints
Discuss the tradeoff between accuracy (centralized counter) and latency (distributed with eventual sync)
Handle Redis unavailability: fail open (allow requests) for non-critical endpoints, fail closed for security-critical ones

3. Hot Key Handling

Popular API keys or heavily-used endpoints create contention on a single Redis key. Interviewers probe your mitigation strategies.

Hints to consider:

Shard counters for hot keys across multiple Redis keys and sum them for enforcement
Use hierarchical rate limiting: per-server local limits plus global coordination for accuracy
Implement rate limit caching on the API gateway with short TTLs to reduce Redis round trips
Pre-allocate token budgets to each server to minimize coordination during burst periods

4. Multi-Region and Edge Deployment

For global APIs, rate limiting must work across regions without cross-region latency on every request.

Hints to consider:

Deploy Redis clusters in each region with regional enforcement plus asynchronous cross-region sync
Accept that global limits will be approximate (slightly over the limit) to avoid cross-region latency
Use the API gateway as the natural enforcement point since all requests pass through it
Consider quota allocation per region based on traffic patterns, with periodic rebalancing