Practice/LinkedIn/Design a Quota Manager for LinkedIn APIs
Design a Quota Manager for LinkedIn APIs
System DesignOptional
Problem Statement
Design a quota management system for LinkedIn's API that enforces consistent rate limiting across clients and allows users to purchase additional quota when needed. The system blends distributed rate limiting, multi-tenant entitlement management, and monetization flows with strong consistency at high scale.
You must enforce limits across many stateless gateways, prevent abuse, handle hot keys, propagate purchased quota in near real time, and make pragmatic trade-offs on latency, consistency, and operability.
Key Requirements
Functional
- Rate enforcement -- API calls are enforced by per-tenant, per-application, and per-API rate limits and time-bound quotas (per second and per day)
- Quota purchasing -- users purchase additional quota or higher rate limits with new entitlements taking effect quickly without downtime
- Usage visibility -- users view current usage and remaining quota via response headers and a usage dashboard in near real time
- Predictable limiting -- users receive predictable responses when limited (429 status, Retry-After headers) and notifications as they approach limits
Non-Functional
- Scalability -- enforce rate limits across thousands of API gateway instances handling millions of requests per second
- Reliability -- continue enforcing limits during partial failures; never allow unlimited access due to a component outage
- Latency -- rate limit checks add less than 5ms overhead to API request latency
- Consistency -- purchased quota takes effect within seconds across all gateway instances; counter accuracy within 5% for distributed enforcement
Interview Reports from Hello Interview
2 reports from candidates. Most recently asked at LinkedIn in Mid September 2025.
This question is primarily asked at LinkedIn.
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Distributed Rate Limiting at Scale
Enforcing consistent limits across thousands of stateless gateway instances is the core challenge. Interviewers want to see how you avoid both over-limiting and under-limiting.
Hints to consider:
- Use Redis atomic counters with Lua scripts for token bucket or sliding window implementations at a centralized coordination layer
- Implement local token leases: each gateway instance requests a batch of tokens (e.g., 100) from the central store and decrements locally, reducing round-trips
- Accept bounded drift in exchange for lower latency; periodically reconcile local counts with the central store
- Shard rate limit keys by tenant to avoid hot-key contention on popular API clients
2. Entitlement Management and Quota Purchasing
Purchased quota must take effect quickly without requiring gateway restarts. Interviewers probe the end-to-end workflow.
Hints to consider:
- Store entitlements (tier, rate limits, quota allocations) in a durable database (DynamoDB) as the source of truth
- Use DynamoDB Streams or a change data capture mechanism to propagate entitlement changes to a Redis cache layer
- Gateway instances read entitlements from cache and refresh on change notifications or short TTLs
- Design the purchase workflow as a saga: authorize payment, update entitlement, notify gateways, confirm to user
3. Hot Key Contention
Popular tenants generate concentrated traffic that can overwhelm a single rate limit counter.
Hints to consider:
- Shard counters for hot tenants across N Redis keys and merge periodically
- Use the local token lease approach where each gateway holds a portion of the rate budget
- Implement adaptive sharding that detects hot keys and increases the shard factor dynamically
- Monitor per-key access rates and alert when a tenant's traffic pattern changes significantly
4. Graceful Degradation During Failures
When the rate limiting infrastructure is unavailable, the system must not fail open (allowing unlimited access) or fail closed (blocking all traffic).
Hints to consider:
- Cache the last known rate limit state locally at each gateway instance with a bounded staleness window
- Implement circuit breakers around Redis calls; on failure, fall back to local counters with conservative limits
- Design a "safe mode" that applies reduced limits based on cached entitlements rather than completely disabling enforcement
- Log and alert on enforcement gaps so that operations can respond to prolonged infrastructure issues