Design and implement a distributed rate limiter using a token bucket algorithm with lazy token refill. You are given two pre-implemented classes:
DistributedCache: A distributed key-value store (e.g., Redis) with get() and put() methods
TokenBucket: A data class representing a user's token bucket state
Your task is to implement two functions:
refill_token_bucket(bucket, current_time): Calculate and apply lazy token refill based on elapsed time
allow_request(cache, user_id, tokens_requested): Check if a user has enough tokens and consume them if so
This pattern is commonly used in:
API rate limiting at scale (millions of users)
Preventing abuse in distributed systems
Fair resource allocation across services
GPU inference quota management
` cache = DistributedCache()
result = allow_request(cache, "user_123", tokens_requested=1) print(result) # True (new bucket created with full capacity)
for _ in range(99): allow_request(cache, "user_123", tokens_requested=1)
result = allow_request(cache, "user_123", tokens_requested=1) print(result) # False (out of tokens)
time.sleep(1) # Wait 1 second (assuming refill_rate=10/sec)
result = allow_request(cache, "user_123", tokens_requested=1) print(result) # True (10 new tokens available) `
What is the expected QPS per user and total system QPS?
Should we support variable rate limits per user (e.g., premium tiers)?
What is the acceptable accuracy? (Some systems allow slight over-limit)
Is there a TTL for inactive user buckets?
Should we return remaining tokens or time until refill?
Implement the refill_token_bucket function that calculates how many tokens should be added based on elapsed time since the last refill.
Why lazy refill? Instead of running a background process to constantly refill all user buckets, we calculate the refill on-demand when a request arrives. This is more efficient for systems with millions of users where most are inactive at any given time.
` bucket = TokenBucket(tokens=50, last_refill_time=1000.0, capacity=100, refill_rate=10.0) updated = refill_token_bucket(bucket, current_time=1001.0)
print(updated.tokens) # 60.0 (50 + 1 second × 10 tokens/sec) print(updated.last_refill_time) # 1001.0 `
Calculate elapsed time since last_refill_time
Add elapsed_time × refill_rate tokens to the bucket
Cap tokens at capacity (do not exceed maximum)
Update last_refill_time to current_time
Handle edge case: if current_time < last_refill_time (clock skew), do not remove tokens
Implement the allow_request function that checks if a user has enough tokens and consumes them atomically.
` cache = DistributedCache()
assert allow_request(cache, "new_user") == True
for _ in range(100): assert allow_request(cache, "user_a") == True
assert allow_request(cache, "user_a") == False # 101st fails
assert allow_request(cache, "user_b") == True `
Build the cache key from user_id (e.g., f"rate_limit:{user_id}")
If user does not exist in cache, create a new bucket with full capacity
Refill tokens based on elapsed time (call refill_token_bucket)
Check if tokens >= tokens_requested
If yes: deduct tokens, update cache, return True
If no: update cache with refilled state (so next request benefits), return False