Practice/Google/Design a Global IP Address Blocking System

Design a Global IP Address Blocking System

System DesignMust

Problem Statement

Large-scale internet services are often required to block traffic from specific IP addresses or ranges due to government mandates, abuse prevention, or compliance obligations. The system must enforce these blocks at the network edge so that prohibited requests never reach backend services, adding zero measurable latency to legitimate traffic.

The challenge intensifies when you consider the sheer volume of requests flowing through a global edge network. Blocking rules may specify individual IPv4 or IPv6 addresses, CIDR ranges of varying prefix lengths, or even entire autonomous system numbers. Rules must propagate worldwide within seconds, yet a misconfigured rule could knock millions of legitimate users offline.

You need to design a system that ingests blocking directives from authorized administrators, distributes them to every edge node across dozens of regions, and enforces them in the hot path of every incoming request with negligible performance impact. The system must also support staged rollouts, instant rollback, and a complete audit trail for regulatory reporting.

Key Requirements

Functional

Rule Management API -- Administrators can create, update, and delete blocking rules that specify individual IPs, CIDR ranges (IPv4 and IPv6), expiration times, and justification metadata.
Edge Enforcement -- Every incoming request is checked against the active rule set at the edge node before any further processing, with blocked requests receiving an appropriate HTTP response.
Global Propagation -- New or updated rules propagate to all edge nodes worldwide within a configurable time window, defaulting to under 30 seconds.
Audit Trail -- Every rule change and every block action is logged with timestamps, rule identifiers, and source metadata for compliance reporting.

Non-Functional

Scalability -- Handle millions of concurrent blocking rules evaluated against hundreds of thousands of requests per second per edge node.
Latency -- Rule evaluation adds less than 1 millisecond to request processing by using in-memory data structures at the edge.
Availability -- Edge enforcement continues operating even if the central control plane is temporarily unreachable, using the last-known rule set.
Consistency -- Eventual consistency with bounded propagation delay; rules reach all nodes within the configured SLA window.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. In-Memory Data Structures for CIDR Matching

Evaluating millions of rules per request demands purpose-built data structures. Interviewers want to see that you understand the trade-offs between different approaches for prefix matching. Hints to consider:

Think about how a trie (prefix tree) maps naturally to binary representations of IP addresses and CIDR prefixes.
Consider how longest-prefix-match works and why it matters when overlapping rules exist.
Evaluate memory consumption differences between a flat hash set of expanded IPs versus a compressed trie for IPv6 ranges.
Think about how you handle both IPv4 and IPv6 in a unified structure without doubling lookup cost.

2. Global Configuration Propagation

Getting updated rules to thousands of edge nodes quickly and reliably is a distributed systems challenge in itself. Hints to consider:

Consider a push-based model using Kafka topics that edge nodes subscribe to versus a pull-based polling model against a central store.
Think about how you version rule sets so that an edge node can detect whether it has fallen behind and request a full snapshot.
Evaluate the trade-off between consistency (all nodes enforce the same rules simultaneously) and availability (nodes keep serving with stale rules).
Consider how you confirm that propagation has completed across all nodes before reporting success to the administrator.

3. Staged Rollout and Rollback

A single bad rule can cause a global outage, so interviewers look for safety mechanisms. Hints to consider:

Think about deploying rules to a canary set of edge nodes first and monitoring error-rate changes before full rollout.
Consider a shadow mode where the rule logs what it would block without actually blocking, letting operators validate impact.
Evaluate how you implement instant rollback — is it a version revert, a rule deletion, or a priority override?
Think about rate-of-change limits that prevent an administrator from accidentally deploying thousands of rules at once.