Practice/LinkedIn/Design a Quota Manager for LinkedIn APIs

Design a Quota Manager for LinkedIn APIs

System DesignOptional

Problem Statement

Design a quota management system for LinkedIn's API that enforces consistent rate limiting across clients and allows users to purchase additional quota when needed. The system blends distributed rate limiting, multi-tenant entitlement management, and monetization flows with strong consistency at high scale.

You must enforce limits across many stateless gateways, prevent abuse, handle hot keys, propagate purchased quota in near real time, and make pragmatic trade-offs on latency, consistency, and operability.

Key Requirements

Functional

Rate enforcement -- API calls are enforced by per-tenant, per-application, and per-API rate limits and time-bound quotas (per second and per day)
Quota purchasing -- users purchase additional quota or higher rate limits with new entitlements taking effect quickly without downtime
Usage visibility -- users view current usage and remaining quota via response headers and a usage dashboard in near real time
Predictable limiting -- users receive predictable responses when limited (429 status, Retry-After headers) and notifications as they approach limits

Non-Functional

Scalability -- enforce rate limits across thousands of API gateway instances handling millions of requests per second
Reliability -- continue enforcing limits during partial failures; never allow unlimited access due to a component outage
Latency -- rate limit checks add less than 5ms overhead to API request latency
Consistency -- purchased quota takes effect within seconds across all gateway instances; counter accuracy within 5% for distributed enforcement

Interview Reports from Hello Interview

2 reports from candidates. Most recently asked at LinkedIn in Mid September 2025.

This question is primarily asked at LinkedIn.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Distributed Rate Limiting at Scale

Enforcing consistent limits across thousands of stateless gateway instances is the core challenge. Interviewers want to see how you avoid both over-limiting and under-limiting.

Hints to consider:

Use Redis atomic counters with Lua scripts for token bucket or sliding window implementations at a centralized coordination layer
Implement local token leases: each gateway instance requests a batch of tokens (e.g., 100) from the central store and decrements locally, reducing round-trips
Accept bounded drift in exchange for lower latency; periodically reconcile local counts with the central store
Shard rate limit keys by tenant to avoid hot-key contention on popular API clients

2. Entitlement Management and Quota Purchasing

Purchased quota must take effect quickly without requiring gateway restarts. Interviewers probe the end-to-end workflow.

Hints to consider:

Store entitlements (tier, rate limits, quota allocations) in a durable database (DynamoDB) as the source of truth
Use DynamoDB Streams or a change data capture mechanism to propagate entitlement changes to a Redis cache layer
Gateway instances read entitlements from cache and refresh on change notifications or short TTLs
Design the purchase workflow as a saga: authorize payment, update entitlement, notify gateways, confirm to user

3. Hot Key Contention

Popular tenants generate concentrated traffic that can overwhelm a single rate limit counter.

Hints to consider:

Shard counters for hot tenants across N Redis keys and merge periodically
Use the local token lease approach where each gateway holds a portion of the rate budget
Implement adaptive sharding that detects hot keys and increases the shard factor dynamically
Monitor per-key access rates and alert when a tenant's traffic pattern changes significantly

4. Graceful Degradation During Failures

When the rate limiting infrastructure is unavailable, the system must not fail open (allowing unlimited access) or fail closed (blocking all traffic).

Hints to consider:

Cache the last known rate limit state locally at each gateway instance with a bounded staleness window
Implement circuit breakers around Redis calls; on failure, fall back to local counters with conservative limits
Design a "safe mode" that applies reduced limits based on cached entitlements rather than completely disabling enforcement
Log and alert on enforcement gaps so that operations can respond to prolonged infrastructure issues

Practice/LinkedIn/Design a Quota Manager for LinkedIn APIs

Design a Quota Manager for LinkedIn APIs

System DesignOptional

Problem Statement

Key Requirements

Functional

Rate enforcement -- API calls are enforced by per-tenant, per-application, and per-API rate limits and time-bound quotas (per second and per day)
Quota purchasing -- users purchase additional quota or higher rate limits with new entitlements taking effect quickly without downtime
Usage visibility -- users view current usage and remaining quota via response headers and a usage dashboard in near real time
Predictable limiting -- users receive predictable responses when limited (429 status, Retry-After headers) and notifications as they approach limits

Non-Functional

Scalability -- enforce rate limits across thousands of API gateway instances handling millions of requests per second
Reliability -- continue enforcing limits during partial failures; never allow unlimited access due to a component outage
Latency -- rate limit checks add less than 5ms overhead to API request latency
Consistency -- purchased quota takes effect within seconds across all gateway instances; counter accuracy within 5% for distributed enforcement

Interview Reports from Hello Interview

2 reports from candidates. Most recently asked at LinkedIn in Mid September 2025.

This question is primarily asked at LinkedIn.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Distributed Rate Limiting at Scale

Enforcing consistent limits across thousands of stateless gateway instances is the core challenge. Interviewers want to see how you avoid both over-limiting and under-limiting.

Hints to consider:

Use Redis atomic counters with Lua scripts for token bucket or sliding window implementations at a centralized coordination layer
Implement local token leases: each gateway instance requests a batch of tokens (e.g., 100) from the central store and decrements locally, reducing round-trips
Accept bounded drift in exchange for lower latency; periodically reconcile local counts with the central store
Shard rate limit keys by tenant to avoid hot-key contention on popular API clients

2. Entitlement Management and Quota Purchasing

Purchased quota must take effect quickly without requiring gateway restarts. Interviewers probe the end-to-end workflow.

Hints to consider:

Store entitlements (tier, rate limits, quota allocations) in a durable database (DynamoDB) as the source of truth
Use DynamoDB Streams or a change data capture mechanism to propagate entitlement changes to a Redis cache layer
Gateway instances read entitlements from cache and refresh on change notifications or short TTLs
Design the purchase workflow as a saga: authorize payment, update entitlement, notify gateways, confirm to user

3. Hot Key Contention

Popular tenants generate concentrated traffic that can overwhelm a single rate limit counter.

Hints to consider:

Shard counters for hot tenants across N Redis keys and merge periodically
Use the local token lease approach where each gateway holds a portion of the rate budget
Implement adaptive sharding that detects hot keys and increases the shard factor dynamically
Monitor per-key access rates and alert when a tenant's traffic pattern changes significantly

4. Graceful Degradation During Failures

When the rate limiting infrastructure is unavailable, the system must not fail open (allowing unlimited access) or fail closed (blocking all traffic).

Hints to consider:

Cache the last known rate limit state locally at each gateway instance with a bounded staleness window
Implement circuit breakers around Redis calls; on failure, fall back to local counters with conservative limits
Design a "safe mode" that applies reduced limits based on cached entitlements rather than completely disabling enforcement
Log and alert on enforcement gaps so that operations can respond to prolonged infrastructure issues