Practice/Microsoft/Design a Distributed Counter
System DesignOptional
Build a centralized rate limiting service that protects backend APIs from abuse and overload by enforcing per-user, per-IP, or per-API-key request quotas across a fleet of distributed web servers. The service must handle millions of requests per second from thousands of microservices, returning allow/deny decisions in single-digit milliseconds while accurately tracking usage windows (e.g., 1000 requests per minute) and gracefully handling partial failures.
Unlike a simple in-memory throttle on one machine, this system must coordinate state across many nodes to prevent a malicious client from bypassing limits by hitting different servers. It also needs to support multiple rate limit policies simultaneously—some users get higher quotas, some APIs have burst allowances, and rules can change dynamically without restarts. Interviewers use this question to assess your understanding of distributed counting, cache coherence, consistency tradeoffs, and how to scale a write-heavy, latency-critical service without creating bottlenecks.
Based on real interview experiences, these are the areas interviewers probe most deeply:
Every rate limit check is a read-modify-write on a counter. If all requests for a popular user funnel through one machine or one database row, you create a serialization bottleneck that caps throughput and latency.
Hints to consider:
Strict global consistency would require distributed locks or consensus on every check, killing latency. But pure eventual consistency might let a user burst past limits by hitting multiple servers before counters converge.
Hints to consider:
Rate limits are time-bound (requests per minute), so accurate timekeeping across distributed nodes is critical. Clock drift can cause some servers to reset windows early or late, creating gaps or false denials.
Hints to consider:
If the central rate limiter becomes unavailable, should all traffic be blocked (fail closed) or allowed (fail open)? Both have risks—fail closed causes outages, fail open invites abuse.
Hints to consider:
Start by confirming scale, accuracy expectations, and failure behavior. Ask:
Clarify whether limits are per-user, per-IP, per-API-key, or combinations, and whether rules are static or dynamically configurable by operators.
Sketch a layered design with these core components:
Client-facing edge: Web servers or API gateways that intercept requests and call the rate limiter before forwarding to backend services. These nodes maintain local in-memory caches of recent counters to serve hot paths without network hops.
Rate limiter cluster: A fleet of stateless service nodes that receive check requests, look up current usage from shared storage, apply the algorithm (token bucket, sliding window), and return allow/deny. These nodes are horizontally scaled and sit behind a load balancer.
Shared storage: Redis or DynamoDB holding current counter values keyed by {user_id, window_start_time}. Redis supports atomic increments and Lua scripts for complex logic; DynamoDB offers global replication and conditional writes for multi-region setups.
Configuration service: A control plane (backed by a database or ZooKeeper) that stores rate limit policies. Changes propagate to rate limiter nodes via polling or push notifications, allowing dynamic updates without redeploy.
Monitoring and logging: Emit metrics on allowed/denied requests, latency percentiles, and cache hit rates. Log violations for security analysis and capacity planning.