Practice/LinkedIn/Design a Malicious IP Detection System
Design a Malicious IP Detection System
System DesignMust
Problem Statement
Design a system that can detect and block malicious IP addresses for an enterprise, including geo-distributed blacklists, edge processing capabilities, and efficient filtering mechanisms like bloom filters to prevent attacks at scale. The system must separate a control plane (detection, list management, distribution) from a data plane (fast, safe enforcement), while handling global updates, failure modes, and adversarial traffic.
The system must trade off accuracy vs latency (false positives/negatives), reason about consistency and propagation across regions, and design safe rollout and observability for security changes at scale.
Key Requirements
Functional
- Policy management -- users define, publish, and manage IP reputation policies and lists (block, allow, score) with TTLs and scopes (global, region, application)
- Edge enforcement -- real-time IP blocking at the edge with sub-millisecond checks, continuing operation during control-plane outages
- Near-real-time distribution -- edge blacklists receive updates across regions with versioning, rollback, and partial rollouts
- Audit and analysis -- searchable logs with reason codes to investigate and tune blocking policies
Non-Functional
- Scalability -- handle millions of requests per second at the edge with IP reputation checks on every inbound request
- Reliability -- edge enforcement continues during control-plane outages using locally cached blacklists
- Latency -- sub-millisecond IP checks at the edge; blacklist updates propagated globally within seconds
- Consistency -- eventual consistency for blacklist propagation; accept brief windows where new threats are not yet blocked at all edges
Interview Reports from Hello Interview
12 reports from candidates. Most recently asked at LinkedIn in Early January 2026.
This question is primarily asked at LinkedIn (all 12 reports are from LinkedIn interviews).
What Interviewers Focus On
Based on real interview experiences, these are the areas interviewers probe most deeply:
1. Edge-Local Detection and Blocking
Every inbound request must be checked against the blacklist with minimal latency. Interviewers want to see how you avoid synchronous remote lookups on the hot path.
Hints to consider:
- Use in-process Bloom filters at the edge for ultra-fast negative lookups (sub-microsecond for IPs not in the blacklist)
- Maintain an L2 exact set (Redis or local hash set) to confirm Bloom filter hits and eliminate false positives
- Download versioned blacklist snapshots periodically and apply delta updates between snapshots
- Design for graceful degradation: if the control plane is unreachable, continue blocking with the last known blacklist
2. Bloom Filter Management and False Positive Control
Bloom filters are critical for performance but introduce false positives that can block legitimate traffic. Interviewers expect a concrete mitigation plan.
Hints to consider:
- Configure target false positive rates (e.g., 0.1%) based on blacklist size and available memory
- Use versioned Bloom filter artifacts with atomic swap on update to ensure consistency during refreshes
- Implement an allowlist mechanism that overrides Bloom filter matches for known-good IPs
- Monitor false positive rates in production and auto-tune filter parameters
3. Control Plane: Detection Pipeline and List Management
Interviewers want to see how malicious IPs are identified, scored, and promoted to the blacklist.
Hints to consider:
- Use Flink to aggregate security signals (failed auth attempts, request patterns, geographic anomalies) and compute per-IP reputation scores
- Promote IPs exceeding a threshold to the blacklist with configurable TTLs (auto-generated blocks expire; manual blocks persist)
- Support canary deployments for new blocking rules: apply to a subset of edge nodes, monitor impact, then roll out globally
- Implement instant rollback capability with signed, versioned blacklist artifacts
4. Safe Rollout and Observability
Pushing untested rules globally can lock out legitimate users. Interviewers expect production safety mechanisms.
Hints to consider:
- Log every block decision with reason codes (which rule, which blacklist version) for post-incident analysis
- Implement an emergency allowlist that can be activated instantly to unblock false positives
- Monitor block rates per edge node, per region, and per rule; alert on unexpected spikes that might indicate a bad rule
- Support A/B testing of new detection algorithms by applying them in shadow mode before enforcement