Design Rate Limiter

[ OK ] 142 — full content available

[ INFO ] category: System Design difficulty: medium freq: medium first seen: 2026-01-12

[MEDIUM][SYSTEM DESIGN][MEDIUM]data_engineeringDistributed Systemswebmachine_learningSystem DesignCachingbackendinfrastructure

$ cat problem.md

Amazon's "Design Rate Limiter" interview question focuses on creating a scalable system to control API request rates, preventing abuse in distributed environments.[1][3]

Problem Statement

Design a distributed rate limiter that enforces configurable limits on HTTP requests from clients identified by user ID, IP address, or API key. It must support rules like "100 requests per minute per user" or endpoint-specific caps, rejecting excess requests with HTTP 429 status, remaining quota headers (X-RateLimit-Remaining), reset timestamps (X-RateLimit-Reset), and Retry-After.[3][1]

Functional Requirements

Identify clients via user ID, IP, or API key.
Apply per-user, per-IP, global, or endpoint-specific limits (e.g., 1000/hour per user, 10/minute per IP on search API).
System interface: isRequestAllowed(clientId, ruleId) -> { passes: boolean, remaining: number, resetTime: timestamp }.
Out of scope: complex analytics, long-term data persistence.[1]

Non-Functional Requirements

Latency under 10ms per check.
Scale to 1M requests/second for 100M daily users.
High availability with eventual consistency (slight cross-node delays acceptable).
Out of scope: strong consistency.[1]

Constraints and Examples

No formal input/output test cases exist, as this is a system design question, but key scenarios include:

| Scenario | Input | Expected Output | Notes | |----------|--------|-----------------|-------| | Allowed request | clientId="user123", ruleId="100/min", current=50 requests | {passes: true, remaining: 50, resetTime: 1640995200} HTTP 200 | Within window.[1] | | Exceeded limit | clientId="user123", ruleId="100/min", current=101 requests | HTTP 429 with X-RateLimit-Remaining: 0, X-RateLimit-Reset: 1640995200, Retry-After: 60, body: {"error": "Rate limit exceeded"} | Fixed window or token bucket algo.[1] | | Scale test | 1M req/sec across shards | Sub-10ms p99 latency | Redis clustering, sharding by clientId.[1] |

Common algorithms: Token Bucket (fixed permits/tokens refilled over time), Leaky Bucket (constant drain rate), Sliding Window (Redis counters with EXPIRE).[7][1]

user@intervues:~/amazon$