Problem Overview

Design and implement a distributed rate limiter using a token bucket algorithm with lazy token refill. You are given two pre-implemented classes:

DistributedCache: A distributed key-value store (e.g., Redis) with get() and put() methods
TokenBucket: A data class representing a user's token bucket state

Your task is to implement two functions:

refill_token_bucket(bucket, current_time): Calculate and apply lazy token refill based on elapsed time
allow_request(cache, user_id, tokens_requested): Check if a user has enough tokens and consume them if so

This pattern is commonly used in:

API rate limiting at scale (millions of users)
Preventing abuse in distributed systems
Fair resource allocation across services
GPU inference quota management

Example Usage

` cache = DistributedCache()

User makes first request

result = allow_request(cache, "user_123", tokens_requested=1) print(result) # True (new bucket created with full capacity)

User makes many requests quickly (exhaust remaining 99 tokens)

for _ in range(99): allow_request(cache, "user_123", tokens_requested=1)

All 100 tokens consumed, next request fails

result = allow_request(cache, "user_123", tokens_requested=1) print(result) # False (out of tokens)

Wait for refill...

time.sleep(1) # Wait 1 second (assuming refill_rate=10/sec)

Now user has tokens again

result = allow_request(cache, "user_123", tokens_requested=1) print(result) # True (10 new tokens available) `

What is the expected QPS per user and total system QPS?
Should we support variable rate limits per user (e.g., premium tiers)?
What is the acceptable accuracy? (Some systems allow slight over-limit)
Is there a TTL for inactive user buckets?
Should we return remaining tokens or time until refill?

Implement the refill_token_bucket function that calculates how many tokens should be added based on elapsed time since the last refill.

Why lazy refill? Instead of running a background process to constantly refill all user buckets, we calculate the refill on-demand when a request arrives. This is more efficient for systems with millions of users where most are inactive at any given time.

` bucket = TokenBucket(tokens=50, last_refill_time=1000.0, capacity=100, refill_rate=10.0) updated = refill_token_bucket(bucket, current_time=1001.0)

print(updated.tokens) # 60.0 (50 + 1 second × 10 tokens/sec) print(updated.last_refill_time) # 1001.0 `

Calculate elapsed time since last_refill_time
Add elapsed_time × refill_rate tokens to the bucket
Cap tokens at capacity (do not exceed maximum)
Update last_refill_time to current_time
Handle edge case: if current_time < last_refill_time (clock skew), do not remove tokens

Implement the allow_request function that checks if a user has enough tokens and consumes them atomically.

` cache = DistributedCache()

New user gets full capacity

assert allow_request(cache, "new_user") == True

Exhaust tokens

for _ in range(100): assert allow_request(cache, "user_a") == True

assert allow_request(cache, "user_a") == False # 101st fails

Different users have separate buckets

assert allow_request(cache, "user_b") == True `

Build the cache key from user_id (e.g., f"rate_limit:{user_id}")
If user does not exist in cache, create a new bucket with full capacity
Refill tokens based on elapsed time (call refill_token_bucket)
Check if tokens >= tokens_requested
If yes: deduct tokens, update cache, return True
If no: update cache with refilled state (so next request benefits), return False

Problem Overview

Design and implement a distributed rate limiter using a token bucket algorithm with lazy token refill. You are given two pre-implemented classes:

DistributedCache: A distributed key-value store (e.g., Redis) with get() and put() methods
TokenBucket: A data class representing a user's token bucket state

Your task is to implement two functions:

refill_token_bucket(bucket, current_time): Calculate and apply lazy token refill based on elapsed time
allow_request(cache, user_id, tokens_requested): Check if a user has enough tokens and consume them if so

This pattern is commonly used in:

API rate limiting at scale (millions of users)
Preventing abuse in distributed systems
Fair resource allocation across services
GPU inference quota management

Example Usage

` cache = DistributedCache()

User makes first request

result = allow_request(cache, "user_123", tokens_requested=1) print(result) # True (new bucket created with full capacity)

User makes many requests quickly (exhaust remaining 99 tokens)

for _ in range(99): allow_request(cache, "user_123", tokens_requested=1)

All 100 tokens consumed, next request fails

result = allow_request(cache, "user_123", tokens_requested=1) print(result) # False (out of tokens)

Wait for refill...

time.sleep(1) # Wait 1 second (assuming refill_rate=10/sec)

Now user has tokens again

result = allow_request(cache, "user_123", tokens_requested=1) print(result) # True (10 new tokens available) `

What is the expected QPS per user and total system QPS?
Should we support variable rate limits per user (e.g., premium tiers)?
What is the acceptable accuracy? (Some systems allow slight over-limit)
Is there a TTL for inactive user buckets?
Should we return remaining tokens or time until refill?

Implement the refill_token_bucket function that calculates how many tokens should be added based on elapsed time since the last refill.

` bucket = TokenBucket(tokens=50, last_refill_time=1000.0, capacity=100, refill_rate=10.0) updated = refill_token_bucket(bucket, current_time=1001.0)

print(updated.tokens) # 60.0 (50 + 1 second × 10 tokens/sec) print(updated.last_refill_time) # 1001.0 `

Calculate elapsed time since last_refill_time
Add elapsed_time × refill_rate tokens to the bucket
Cap tokens at capacity (do not exceed maximum)
Update last_refill_time to current_time
Handle edge case: if current_time < last_refill_time (clock skew), do not remove tokens

Implement the allow_request function that checks if a user has enough tokens and consumes them atomically.

` cache = DistributedCache()

New user gets full capacity

assert allow_request(cache, "new_user") == True

Exhaust tokens

for _ in range(100): assert allow_request(cache, "user_a") == True

assert allow_request(cache, "user_a") == False # 101st fails

Different users have separate buckets

assert allow_request(cache, "user_b") == True `

Build the cache key from user_id (e.g., f"rate_limit:{user_id}")
If user does not exist in cache, create a new bucket with full capacity
Refill tokens based on elapsed time (call refill_token_bucket)
Check if tokens >= tokens_requested
If yes: deduct tokens, update cache, return True
If no: update cache with refilled state (so next request benefits), return False

Coding - Dropped Requests (Rate Limiter)

Problem Overview

Example Usage

User makes first request

User makes many requests quickly (exhaust remaining 99 tokens)

All 100 tokens consumed, next request fails

Wait for refill...

Now user has tokens again

New user gets full capacity

Exhaust tokens

Different users have separate buckets

Coding - Dropped Requests (Rate Limiter)

Problem Overview

Example Usage

User makes first request

User makes many requests quickly (exhaust remaining 99 tokens)

All 100 tokens consumed, next request fails

Wait for refill...

Now user has tokens again

New user gets full capacity

Exhaust tokens

Different users have separate buckets