Practice/Google/Design an Ad Server

Design an Ad Server

System DesignMust

Problem Statement

An ad server sits at the intersection of real-time decisioning, auction economics, and massive-scale event processing. Every time a user loads a webpage or opens a mobile app, the ad server must select the most relevant advertisement from thousands of candidates, execute a pricing auction, and return a creative asset — all within a strict latency budget that typically cannot exceed 100 milliseconds end-to-end.

Beyond selection, the system must track advertiser budgets in near-real-time to avoid overspending, enforce frequency caps so users are not bombarded with the same ad, and log every impression and click for billing reconciliation and reporting. A single misattributed click or a budget overrun can translate into significant financial liability.

You need to design a system that handles millions of ad requests per second, maintains sub-100ms response times, supports sophisticated targeting (demographics, interests, geography, device), runs fair auctions, and produces accurate billing logs — all while remaining resilient to traffic spikes and component failures.

Key Requirements

Functional

Ad Selection and Targeting -- Given a user context (demographics, interests, location, device), retrieve all eligible ad campaigns and filter by targeting criteria, frequency caps, and remaining budget.
Auction Execution -- Run a second-price auction among eligible ads, applying bid adjustments and floor prices, and return the winning ad creative with its pricing metadata.
Budget and Frequency Management -- Track each campaign's spend in near-real-time and pause delivery when budgets are exhausted; enforce per-user frequency caps across sessions and devices.
Event Logging -- Record every impression, click, and conversion event with tamper-evident timestamps for billing, reporting, and fraud detection.

Non-Functional

Scalability -- Serve millions of ad requests per second across global points of presence, scaling horizontally with traffic.
Latency -- End-to-end ad selection and response in under 100 milliseconds at the 99th percentile.
Accuracy -- Budget tracking must not overshoot by more than a small configurable threshold (e.g., 1% of daily budget).
Durability -- Impression and click logs must never be lost, as they directly impact billing and revenue reconciliation.

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Low-Latency Ad Selection Pipeline

The entire decisioning pipeline — targeting, filtering, auction, creative lookup — must complete in tens of milliseconds. Interviewers want to see how you decompose and optimize each stage. Hints to consider:

Think about pre-computing inverted indexes of campaigns by targeting dimensions (geo, interest, device) so that candidate retrieval is a set intersection rather than a full scan.
Consider how you store and retrieve campaign metadata and creative assets — Redis for hot metadata, a CDN for creative payloads.
Evaluate whether you run the auction synchronously in the request path or if any stages can be parallelized.
Think about how you handle the cold-start problem when a new campaign launches and has no historical performance data.

2. Near-Real-Time Budget Tracking

Overspending an advertiser's budget is a direct financial loss. Interviewers probe how you maintain accurate counters at massive scale. Hints to consider:

Consider local counters at each ad server instance that periodically sync to a centralized budget service, accepting a bounded error margin.
Think about what happens during a sync failure — do you optimistically continue spending or conservatively pause?
Evaluate how you handle budget pacing (spreading a daily budget evenly) versus front-loading spend.
Consider the race condition when multiple servers simultaneously check and decrement a shared budget counter.

3. Frequency Capping Across Devices

Users interact across phones, tablets, and desktops. Interviewers want to see how you enforce caps without a synchronous cross-device lookup on every request. Hints to consider:

Think about using a probabilistic data structure (like a counting Bloom filter) for fast local checks with periodic reconciliation.
Consider how you identify the same user across devices — logged-in identity graph versus probabilistic device matching.
Evaluate the trade-off between strict cap enforcement (requires synchronous lookup) and approximate enforcement (faster but may slightly overshoot).
Think about where you store frequency counters — a distributed cache like Redis with TTL-based expiration aligns naturally with time-windowed caps.