Practice/Spotify/Design an Ad Server

Design an Ad Server

System DesignMust

Problem Statement

Design an ad server system that selects and delivers targeted advertisements to users in real time whenever a web page or application requests an ad slot. Think of platforms like Google Ads or the ad breaks between songs on Spotify's free tier -- advertisers define campaigns with budgets and targeting criteria, publishers provide ad placements, and the ad server decides which ad to show to which user in milliseconds.

The fundamental challenge is making a high-quality ad selection decision under extreme time pressure. Each ad request triggers a multi-stage pipeline: retrieve the user's profile and context, filter thousands of eligible campaigns by targeting rules, run an auction to determine the winning ad, enforce budget and frequency caps, and return a creative -- all within 50-100ms. At scale, the system must handle hundreds of thousands of requests per second while maintaining accurate budget accounting and preventing overspend.

Interviewers use this problem to evaluate your ability to design latency-critical read paths, manage distributed counters under contention, and orchestrate multi-step decision flows where each stage has different consistency and performance requirements.

Key Requirements

Functional

Campaign management -- advertisers create campaigns with targeting rules (audience segments, geography, device type, time of day), bid amounts, daily and total budgets, and creative assets
Ad selection and auction -- given an ad request with user context and placement metadata, the system filters eligible campaigns, ranks them by bid and relevance score, and returns the winning creative
Budget and frequency enforcement -- enforce per-campaign daily and total budget caps and per-user frequency caps so that no campaign overspends and no user sees the same ad excessively
Event tracking -- record impressions, clicks, and conversions with deduplication, then aggregate metrics for advertiser dashboards and billing

Non-Functional

Scalability -- handle 500,000 ad requests per second at peak, with 10 million active campaigns and 500 million user profiles
Latency -- return an ad decision within 100ms at p99, including targeting lookup, auction, and cap checks
Reliability -- 99.95% uptime for the ad serving path; degrade gracefully to backfill or house ads if personalization components fail
Consistency -- budget counters must be accurate within a small margin (less than 1% overspend); impression logs must be durable and deduplicated for correct billing

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Low-Latency Ad Selection Pipeline

The core serving path must complete multiple steps (user lookup, targeting filter, auction, cap check, creative fetch) within a strict latency budget. Interviewers want to see how you parallelize work and keep each stage fast.

Hints to consider:

Precompute and cache targeting indices (inverted indexes mapping segment to campaign IDs) in Redis or an in-memory store so that filtering eligible campaigns takes under 10ms
Run user profile lookup and targeting index lookup in parallel since they are independent
Use a two-phase approach: a fast pre-filter reduces thousands of campaigns to dozens of candidates, then a more expensive ranking model scores the short list
Keep creative assets on a CDN and return only a creative URL in the ad response to minimize payload size and serving latency

2. Distributed Budget and Frequency Counters

Budget caps and frequency caps create hot counters that every ad decision must read and write. Without careful design, these become bottlenecks that cause either overspend or excessive latency.

Hints to consider:

Shard budget counters by campaign ID across multiple Redis nodes; each serving instance maintains a local counter and periodically syncs to the central store
Use a "budget reservation" pattern: each serving node reserves a small chunk of remaining budget (e.g., 1% of daily budget) and decrements locally, reducing the frequency of distributed writes
For frequency caps, store per-user impression counts in Redis with TTLs matching the cap window (e.g., 24 hours); use approximate checks on the fast path and reconcile asynchronously
Accept that a small amount of overspend (under 1%) is tolerable in exchange for not adding a synchronous distributed lock to every ad decision

3. Auction Mechanics and Pacing

Selecting the highest bidder is not sufficient -- the system must also pace delivery evenly across the day and optimize for long-term revenue rather than immediately exhausting high-bid campaigns.

Hints to consider:

Implement a second-price auction where the winner pays one cent above the second-highest bid, incentivizing truthful bidding
Add a pacing multiplier to each campaign's effective bid based on how much of the daily budget has been spent relative to the time elapsed in the day
Combine bid price with a predicted click-through rate to compute an expected revenue score (eCPM), ensuring the system optimizes for revenue rather than raw bid amount
Reserve a percentage of inventory for backfill and house ads to maintain fill rate even when premium demand is low

4. Event Logging, Deduplication, and Attribution

Impressions and clicks drive billing, so the logging pipeline must be durable and deduplicated. Interviewers probe how you handle duplicate events, late-arriving data, and conversion attribution windows.

Hints to consider:

Write impression and click events to Kafka with an idempotency key (request ID plus event type) to enable exactly-once processing downstream
Use a streaming processor (Flink or Kafka Streams) to deduplicate events within a time window before writing to the billing aggregation store
Store raw events in a data lake (S3 plus Parquet) for audit and reconciliation; maintain real-time aggregates in a fast OLAP store (ClickHouse or Druid) for advertiser dashboards
Attribution windows for conversions (e.g., 7-day click-through, 1-day view-through) require joining conversion events against a lookback of impressions and clicks per user

Practice/Spotify/Design an Ad Server

Design an Ad Server

System DesignMust

Problem Statement

Key Requirements

Functional

Campaign management -- advertisers create campaigns with targeting rules (audience segments, geography, device type, time of day), bid amounts, daily and total budgets, and creative assets
Ad selection and auction -- given an ad request with user context and placement metadata, the system filters eligible campaigns, ranks them by bid and relevance score, and returns the winning creative
Budget and frequency enforcement -- enforce per-campaign daily and total budget caps and per-user frequency caps so that no campaign overspends and no user sees the same ad excessively
Event tracking -- record impressions, clicks, and conversions with deduplication, then aggregate metrics for advertiser dashboards and billing

Non-Functional

Scalability -- handle 500,000 ad requests per second at peak, with 10 million active campaigns and 500 million user profiles
Latency -- return an ad decision within 100ms at p99, including targeting lookup, auction, and cap checks
Reliability -- 99.95% uptime for the ad serving path; degrade gracefully to backfill or house ads if personalization components fail
Consistency -- budget counters must be accurate within a small margin (less than 1% overspend); impression logs must be durable and deduplicated for correct billing

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Low-Latency Ad Selection Pipeline

Hints to consider:

Precompute and cache targeting indices (inverted indexes mapping segment to campaign IDs) in Redis or an in-memory store so that filtering eligible campaigns takes under 10ms
Run user profile lookup and targeting index lookup in parallel since they are independent
Use a two-phase approach: a fast pre-filter reduces thousands of campaigns to dozens of candidates, then a more expensive ranking model scores the short list
Keep creative assets on a CDN and return only a creative URL in the ad response to minimize payload size and serving latency

2. Distributed Budget and Frequency Counters

Budget caps and frequency caps create hot counters that every ad decision must read and write. Without careful design, these become bottlenecks that cause either overspend or excessive latency.

Hints to consider:

Shard budget counters by campaign ID across multiple Redis nodes; each serving instance maintains a local counter and periodically syncs to the central store
Use a "budget reservation" pattern: each serving node reserves a small chunk of remaining budget (e.g., 1% of daily budget) and decrements locally, reducing the frequency of distributed writes
For frequency caps, store per-user impression counts in Redis with TTLs matching the cap window (e.g., 24 hours); use approximate checks on the fast path and reconcile asynchronously
Accept that a small amount of overspend (under 1%) is tolerable in exchange for not adding a synchronous distributed lock to every ad decision

3. Auction Mechanics and Pacing

Selecting the highest bidder is not sufficient -- the system must also pace delivery evenly across the day and optimize for long-term revenue rather than immediately exhausting high-bid campaigns.

Hints to consider:

Implement a second-price auction where the winner pays one cent above the second-highest bid, incentivizing truthful bidding
Add a pacing multiplier to each campaign's effective bid based on how much of the daily budget has been spent relative to the time elapsed in the day
Combine bid price with a predicted click-through rate to compute an expected revenue score (eCPM), ensuring the system optimizes for revenue rather than raw bid amount
Reserve a percentage of inventory for backfill and house ads to maintain fill rate even when premium demand is low

4. Event Logging, Deduplication, and Attribution

Hints to consider:

Write impression and click events to Kafka with an idempotency key (request ID plus event type) to enable exactly-once processing downstream
Use a streaming processor (Flink or Kafka Streams) to deduplicate events within a time window before writing to the billing aggregation store
Store raw events in a data lake (S3 plus Parquet) for audit and reconciliation; maintain real-time aggregates in a fast OLAP store (ClickHouse or Druid) for advertiser dashboards
Attribution windows for conversions (e.g., 7-day click-through, 1-day view-through) require joining conversion events against a lookback of impressions and clicks per user