Practice/Uber/Design a Rate Limiter
Design a Rate Limiter
System DesignMust
Problem Statement
Design a distributed rate limiting system that can track API requests across multiple servers and enforce limits per user/client to prevent abuse and maintain system performance. The system must support multiple rate limiting algorithms, handle high concurrency, and provide clear feedback to throttled clients.
A distributed rate limiter is a control plane that tracks and limits how many API requests a client (user, IP, API key, or service) can make across a fleet of servers in a short time. Think of the guardrails behind API gateways at companies like Uber, Meta, or OpenAI that prevent abusive bursts while keeping legitimate traffic fast.
Interviewers at Uber ask this to see if you can design low-latency, highly concurrent systems that remain correct under contention and failures. They look for your ability to reason about algorithms (token bucket, leaky bucket, sliding window), atomic updates across a fleet, hot-key mitigation, and clear client signaling (429, Retry-After).
Key Requirements
Functional
- Cross-server enforcement -- request limits enforced consistently across multiple servers and regions for identities like user ID, API key, or IP
- Configurable policies -- support per-endpoint limits, burst vs sustained rates, and per-tenant quotas that can be updated without downtime
- Client feedback -- throttled users receive appropriate status codes (429) and retry hints (Retry-After header)
- Observability -- near real-time usage and throttling metrics to understand policy effects and detect abuse
Non-Functional
- Scalability -- handle millions of rate-limit checks per second across thousands of server instances
- Reliability -- degrade gracefully if the central counter store is unavailable rather than blocking all traffic
- Latency -- add less than 5 ms of overhead per request for the rate-limit check in the p99 case
- Consistency -- tolerate small over-limit windows (brief burst above threshold) in exchange for low latency; strict consistency is not required
What Interviewers Focus On
Based on real interview experiences at Uber, OpenAI, and Oracle, these are the areas interviewers probe most deeply:
1. Rate Limiting Algorithm Selection
Different algorithms have different tradeoffs for burst handling, memory usage, and accuracy. Interviewers expect you to compare approaches and justify your choice.
Hints to consider:
- Token bucket allows controlled bursts while enforcing a sustained rate; good for API clients that send in batches
- Sliding window log provides exact counts but requires storing every request timestamp, which is expensive at scale
- Sliding window counter approximates the sliding window using fixed-window counts with proportional weighting, balancing accuracy and memory
- Fixed window is simplest but allows double bursts at window boundaries; discuss this tradeoff explicitly
2. Distributed Counter Management and Contention
The counter store must handle millions of concurrent increment operations with minimal contention. Interviewers probe how you avoid hot keys and maintain accuracy.
Hints to consider:
- Use Redis with Lua scripts for atomic check-and-increment operations to avoid race conditions between check and update
- For extremely hot keys (rate-limited globally per endpoint), shard counters across multiple Redis keys and sum them
- Consider local per-instance counters with periodic synchronization for less strict rate limits to reduce Redis traffic
- Implement a fallback policy (fail open or fail closed) when the Redis cluster is unreachable
3. Multi-Tier Rate Limiting
Real systems need rate limits at multiple granularities: per user, per endpoint, per tenant, and global. Interviewers look for a composable design.
Hints to consider:
- Evaluate limits from most specific to least specific: per-user-per-endpoint, per-user, per-tenant, global
- Short-circuit on the first limit hit to minimize Redis round trips
- Batch multiple limit checks into a single Redis pipeline or Lua script for efficiency
- Store rate limit configurations in a policy service that supports dynamic updates without redeployment
4. Graceful Degradation and Failure Modes
The rate limiter must not become a single point of failure. Interviewers assess whether your system handles Redis outages and network partitions.
Hints to consider:
- Default to allowing requests (fail open) when the counter store is unavailable, with local in-memory estimation as a safety net
- Use Redis Cluster with replicas for high availability and automatic failover
- Implement circuit breakers in the rate limiter client to avoid cascading latency when Redis is slow
- Log throttled and allowed-by-fallback requests for post-incident analysis