Practice/Uber/Design a Rate Limiter

Design a Rate Limiter

System DesignMust

Problem Statement

Design a distributed rate limiting system that can track API requests across multiple servers and enforce limits per user/client to prevent abuse and maintain system performance. The system must support multiple rate limiting algorithms, handle high concurrency, and provide clear feedback to throttled clients.

A distributed rate limiter is a control plane that tracks and limits how many API requests a client (user, IP, API key, or service) can make across a fleet of servers in a short time. Think of the guardrails behind API gateways at companies like Uber, Meta, or OpenAI that prevent abusive bursts while keeping legitimate traffic fast.

Interviewers at Uber ask this to see if you can design low-latency, highly concurrent systems that remain correct under contention and failures. They look for your ability to reason about algorithms (token bucket, leaky bucket, sliding window), atomic updates across a fleet, hot-key mitigation, and clear client signaling (429, Retry-After).

Key Requirements

Functional

Cross-server enforcement -- request limits enforced consistently across multiple servers and regions for identities like user ID, API key, or IP
Configurable policies -- support per-endpoint limits, burst vs sustained rates, and per-tenant quotas that can be updated without downtime
Client feedback -- throttled users receive appropriate status codes (429) and retry hints (Retry-After header)
Observability -- near real-time usage and throttling metrics to understand policy effects and detect abuse

Non-Functional

Scalability -- handle millions of rate-limit checks per second across thousands of server instances
Reliability -- degrade gracefully if the central counter store is unavailable rather than blocking all traffic
Latency -- add less than 5 ms of overhead per request for the rate-limit check in the p99 case
Consistency -- tolerate small over-limit windows (brief burst above threshold) in exchange for low latency; strict consistency is not required

What Interviewers Focus On

Based on real interview experiences at Uber, OpenAI, and Oracle, these are the areas interviewers probe most deeply:

1. Rate Limiting Algorithm Selection

Different algorithms have different tradeoffs for burst handling, memory usage, and accuracy. Interviewers expect you to compare approaches and justify your choice.

Hints to consider:

Token bucket allows controlled bursts while enforcing a sustained rate; good for API clients that send in batches
Sliding window log provides exact counts but requires storing every request timestamp, which is expensive at scale
Sliding window counter approximates the sliding window using fixed-window counts with proportional weighting, balancing accuracy and memory
Fixed window is simplest but allows double bursts at window boundaries; discuss this tradeoff explicitly

2. Distributed Counter Management and Contention

The counter store must handle millions of concurrent increment operations with minimal contention. Interviewers probe how you avoid hot keys and maintain accuracy.

Hints to consider:

Use Redis with Lua scripts for atomic check-and-increment operations to avoid race conditions between check and update
For extremely hot keys (rate-limited globally per endpoint), shard counters across multiple Redis keys and sum them
Consider local per-instance counters with periodic synchronization for less strict rate limits to reduce Redis traffic
Implement a fallback policy (fail open or fail closed) when the Redis cluster is unreachable

3. Multi-Tier Rate Limiting

Real systems need rate limits at multiple granularities: per user, per endpoint, per tenant, and global. Interviewers look for a composable design.

Hints to consider:

Evaluate limits from most specific to least specific: per-user-per-endpoint, per-user, per-tenant, global
Short-circuit on the first limit hit to minimize Redis round trips
Batch multiple limit checks into a single Redis pipeline or Lua script for efficiency
Store rate limit configurations in a policy service that supports dynamic updates without redeployment

4. Graceful Degradation and Failure Modes

The rate limiter must not become a single point of failure. Interviewers assess whether your system handles Redis outages and network partitions.

Hints to consider:

Default to allowing requests (fail open) when the counter store is unavailable, with local in-memory estimation as a safety net
Use Redis Cluster with replicas for high availability and automatic failover
Implement circuit breakers in the rate limiter client to avoid cascading latency when Redis is slow
Log throttled and allowed-by-fallback requests for post-incident analysis

Practice/Uber/Design a Rate Limiter

Design a Rate Limiter

System DesignMust

Problem Statement

Key Requirements

Functional

Cross-server enforcement -- request limits enforced consistently across multiple servers and regions for identities like user ID, API key, or IP
Configurable policies -- support per-endpoint limits, burst vs sustained rates, and per-tenant quotas that can be updated without downtime
Client feedback -- throttled users receive appropriate status codes (429) and retry hints (Retry-After header)
Observability -- near real-time usage and throttling metrics to understand policy effects and detect abuse

Non-Functional

Scalability -- handle millions of rate-limit checks per second across thousands of server instances
Reliability -- degrade gracefully if the central counter store is unavailable rather than blocking all traffic
Latency -- add less than 5 ms of overhead per request for the rate-limit check in the p99 case
Consistency -- tolerate small over-limit windows (brief burst above threshold) in exchange for low latency; strict consistency is not required

What Interviewers Focus On

Based on real interview experiences at Uber, OpenAI, and Oracle, these are the areas interviewers probe most deeply:

1. Rate Limiting Algorithm Selection

Different algorithms have different tradeoffs for burst handling, memory usage, and accuracy. Interviewers expect you to compare approaches and justify your choice.

Hints to consider:

Token bucket allows controlled bursts while enforcing a sustained rate; good for API clients that send in batches
Sliding window log provides exact counts but requires storing every request timestamp, which is expensive at scale
Sliding window counter approximates the sliding window using fixed-window counts with proportional weighting, balancing accuracy and memory
Fixed window is simplest but allows double bursts at window boundaries; discuss this tradeoff explicitly

2. Distributed Counter Management and Contention

The counter store must handle millions of concurrent increment operations with minimal contention. Interviewers probe how you avoid hot keys and maintain accuracy.

Hints to consider:

Use Redis with Lua scripts for atomic check-and-increment operations to avoid race conditions between check and update
For extremely hot keys (rate-limited globally per endpoint), shard counters across multiple Redis keys and sum them
Consider local per-instance counters with periodic synchronization for less strict rate limits to reduce Redis traffic
Implement a fallback policy (fail open or fail closed) when the Redis cluster is unreachable

3. Multi-Tier Rate Limiting

Real systems need rate limits at multiple granularities: per user, per endpoint, per tenant, and global. Interviewers look for a composable design.

Hints to consider:

Evaluate limits from most specific to least specific: per-user-per-endpoint, per-user, per-tenant, global
Short-circuit on the first limit hit to minimize Redis round trips
Batch multiple limit checks into a single Redis pipeline or Lua script for efficiency
Store rate limit configurations in a policy service that supports dynamic updates without redeployment

4. Graceful Degradation and Failure Modes

The rate limiter must not become a single point of failure. Interviewers assess whether your system handles Redis outages and network partitions.

Hints to consider:

Default to allowing requests (fail open) when the counter store is unavailable, with local in-memory estimation as a safety net
Use Redis Cluster with replicas for high availability and automatic failover
Implement circuit breakers in the rate limiter client to avoid cascading latency when Redis is slow
Log throttled and allowed-by-fallback requests for post-incident analysis