Practice/Oracle/Design a Rate Limiter
Design a Rate Limiter
System DesignMust
Problem Statement
Design a distributed rate limiting system that can track API requests across multiple servers and enforce limits per user or client to prevent abuse and maintain system performance. Think of the guardrails behind API gateways at companies like Meta or OpenAI that prevent abusive bursts while keeping legitimate traffic fast.
Interviewers ask this to see if you can design low-latency, highly concurrent systems that remain correct under contention and failures. They look for your ability to reason about algorithms (token bucket, leaky bucket, sliding window), atomic updates across a fleet, multi-region tradeoffs, hot-key mitigation, and clear client signaling (429, Retry-After).
Key Requirements
Functional
- Cross-server enforcement -- request limits are enforced consistently across multiple servers and regions for identities like user ID, API key, or IP
- Configurable policies -- define different policies (per-endpoint limits, burst vs sustained rates, per-tenant quotas) and update them without downtime
- Clear feedback -- throttled users receive appropriate status codes (429) and retry hints (Retry-After header)
- Observability -- usage and throttling metrics are available in near real time to understand policy effects and detect abuse
Non-Functional
- Scalability -- handle millions of rate-check decisions per second across a distributed fleet
- Reliability -- fail open or closed based on configuration; avoid becoming a single point of failure for the entire API
- Latency -- add less than 5ms of overhead to the request path for rate-check decisions
- Consistency -- slight over-admission during transient partitions is acceptable; prefer availability over strict accuracy
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Rate Limiting Algorithm Selection
Interviewers expect you to know the tradeoffs between common algorithms and pick one with justification.
Hints to consider:
- Token bucket: smooth rate limiting with configurable burst capacity; simple to implement with atomic counters
- Sliding window log: precise but memory-intensive since it stores every request timestamp
- Sliding window counter: approximate but memory-efficient; hybrid of fixed and sliding approaches
- Discuss boundary burst problems with fixed-window counters (clients can send 2x the limit at window edges)
2. Distributed State Management
The core challenge is maintaining accurate counters across multiple servers without creating bottlenecks.
Hints to consider:
- Use Redis with Lua scripting for atomic check-and-increment operations in a single round trip
- Consider local counters with periodic sync to reduce round trips for high-volume endpoints
- Discuss the tradeoff between accuracy (centralized counter) and latency (distributed with eventual sync)
- Handle Redis unavailability: fail open (allow requests) for non-critical endpoints, fail closed for security-critical ones
3. Hot Key Handling
Popular API keys or heavily-used endpoints create contention on a single Redis key. Interviewers probe your mitigation strategies.
Hints to consider:
- Shard counters for hot keys across multiple Redis keys and sum them for enforcement
- Use hierarchical rate limiting: per-server local limits plus global coordination for accuracy
- Implement rate limit caching on the API gateway with short TTLs to reduce Redis round trips
- Pre-allocate token budgets to each server to minimize coordination during burst periods
4. Multi-Region and Edge Deployment
For global APIs, rate limiting must work across regions without cross-region latency on every request.
Hints to consider:
- Deploy Redis clusters in each region with regional enforcement plus asynchronous cross-region sync
- Accept that global limits will be approximate (slightly over the limit) to avoid cross-region latency
- Use the API gateway as the natural enforcement point since all requests pass through it
- Consider quota allocation per region based on traffic patterns, with periodic rebalancing