Design an Ad Click Aggregator — Snapchat

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Ad Click Aggregator guide.

Problem Statement

Design a system that ingests billions of advertising events -- clicks, impressions, and conversions -- every day and aggregates them into near-real-time metrics that advertisers can query from a dashboard. Think of the analytics backend behind platforms like Google Ads or Meta Ads Manager, where campaign owners see click-through rates, spend, and conversion counts updating within seconds of an event occurring.

The central engineering challenge is building a pipeline that can sustain millions of events per second while guaranteeing each click is counted exactly once, even when producers retry or network partitions cause duplicate delivery. Raw events must flow through windowed aggregation stages that roll data up into minute, hour, and day granularities, all with a maximum staleness of about 30 seconds from event time to dashboard visibility.

Beyond freshness, the system must support multi-dimensional slicing -- advertisers filter by campaign, ad group, creative, device type, and geography -- and retain queryable history for up to two years. This means you need a tiered storage strategy where recent fine-grained data lives in fast stores and older data is compacted into cheaper columnar or object storage without sacrificing query flexibility.

Key Requirements

Functional

Event ingestion -- capture every ad click and impression with metadata including ad ID, campaign ID, timestamp, device type, and geo, without adding latency to the user redirect
Near-real-time aggregation -- produce minute-level metric rollups (clicks, impressions, CTR) available within 30 seconds of the underlying events
Multi-dimensional querying -- let advertisers slice metrics by campaign, ad, device, geography, and custom time ranges from a dashboard
Historical retention -- store and serve aggregated analytics for up to two years with hourly and daily rollup granularities
Exactly-once counting -- deduplicate retried or replayed events so that each click increments counters precisely once

Non-Functional

Scalability -- handle peak ingestion of 5 million events per second with the ability to horizontally add capacity during traffic spikes
Latency -- dashboard queries over recent data return within 500 milliseconds at the 99th percentile
Reliability -- tolerate individual broker, worker, or storage node failures without losing events or producing incorrect counts
Cost efficiency -- keep storage costs manageable by compacting older data into progressively coarser granularities and cheaper tiers

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Streaming Pipeline Design and Exactly-Once Semantics

The pipeline must convert raw events into aggregated counts with strict freshness guarantees. Interviewers want to see how you chain ingestion, deduplication, windowed aggregation, and sink stages while preserving correctness across failures.

Hints to consider:

Consider how Kafka partitioning by ad or campaign ID enables parallel consumption, and what happens when a partition is skewed by a viral campaign
Explore how Apache Flink checkpoints and barriers provide exactly-once state updates even when workers crash mid-window
Think about how you handle late-arriving events that land after the window has already been flushed -- watermarks, allowed lateness, and side outputs
Discuss idempotent sink writes so that replaying a Kafka offset range does not double-count aggregates

2. Hot Key and Skew Mitigation

A single popular ad campaign can receive orders of magnitude more clicks than the median, turning one partition into a bottleneck. Interviewers look for awareness of this skew and concrete countermeasures.

Hints to consider:

Salt hot keys by appending a random suffix so that a single campaign spreads across multiple partitions, then merge partial counts during rollup
Use local pre-aggregation inside each Flink operator before shuffling to the global aggregation stage, reducing network and state pressure
Monitor partition lag to detect emerging hotspots and dynamically repartition or shed load
Consider separate fast paths for the top 1 percent of campaigns identified via real-time metrics

3. Multi-Dimensional Query Performance

Advertisers expect sub-second responses when filtering by campaign, date range, device, and geography over months of data. A naive scan of raw events is far too slow.

Hints to consider:

Pre-compute common rollups (campaign-by-day, ad-by-hour) as materialized views so that dashboard queries hit small tables instead of scanning raw rows
Use a columnar analytical database like ClickHouse or Apache Druid that compresses well and supports fast group-by queries over time-series data
Index on high-cardinality dimensions like ad ID and campaign ID while partitioning tables by date for efficient range pruning
Cache the most recently requested dashboard panels in Redis with a TTL matching the aggregation freshness window

4. Tiered Storage and Cost Control

Storing two years of minute-level data for every ad is prohibitively expensive. You need a strategy that balances query speed against storage cost.

Hints to consider:

Keep the last 7 days of minute-level aggregates in a hot store like ClickHouse for real-time dashboards
Run nightly compaction jobs that roll minute data into hourly aggregates and move them to a warm tier, freeing hot storage
Archive data older than 90 days into Parquet files on S3, queryable via Athena or Spark for ad-hoc historical analysis
Use TTL-based auto-expiration in the hot tier to avoid manual cleanup and keep storage predictable

Suggested Approach

Step 1: Clarify Requirements

Ask the interviewer about expected event volume per second, the maximum acceptable staleness between an event and its dashboard visibility, and which dimensions advertisers need to filter by. Confirm whether approximate counts are acceptable for certain metrics or if exact counts are required. Clarify the retention window and whether the system must support backfilling historical data after a pipeline fix.

Step 2: High-Level Architecture

Sketch a pipeline: edge redirect servers emit click and impression events to Kafka topics partitioned by campaign ID with optional key salting. A Flink streaming job consumes events, deduplicates using a Redis-backed lookup on event IDs, and aggregates counts into tumbling one-minute windows. Flink sinks write minute-level aggregates into ClickHouse for real-time queries. Separate batch jobs roll minute data into hourly and daily tables, eventually archiving to S3 in Parquet format. An API layer fronts ClickHouse and the archive tier, routing queries to the appropriate store based on the requested time range.

Step 3: Deep Dive -- Deduplication and Windowed Aggregation

Walk through a single click event end to end. The user clicks an ad, the redirect service logs the click with a unique event_id and forwards the user to the landing page. The event lands in Kafka. A Flink consumer reads it, checks Redis for the event_id with a 24-hour TTL -- if found, the event is discarded as a duplicate. If new, it is added to the in-flight window state. At the window boundary (every 60 seconds), Flink flushes the aggregate (campaign, ad, minute bucket, count) to ClickHouse using an idempotent upsert keyed on the composite of those fields. If the Flink job restarts, it replays from the last checkpoint offset. Because the sink is idempotent, replayed events produce the same aggregate rows without inflation.

Step 4: Address Secondary Concerns

Backfill and correction: When a pipeline bug is discovered, replay the affected Kafka offset range through a parallel Flink job writing to a staging table, then swap it into production via an atomic table rename in ClickHouse.

Monitoring: Track end-to-end latency from event timestamp to ClickHouse row visibility, Flink checkpoint duration, Kafka consumer lag per partition, and Redis deduplication cache hit rate. Alert when freshness exceeds 30 seconds or lag grows beyond a threshold.

Cost optimization: Use ClickHouse table TTLs to auto-drop minute-level partitions after 7 days. Schedule nightly Spark jobs that compact hourly tables into daily rollups before archiving to S3, reducing hot storage by over 90 percent.

Related Learning

Deepen your understanding of the patterns used in this problem:

Ad Click Aggregator -- Full walkthrough of a real-time ad analytics pipeline with architecture diagrams and deep dives
Top-K Videos -- Similar stream-processing and windowed-counting patterns applied to ranking problems
Distributed Counters -- Techniques for scaling write-heavy counting workloads across distributed systems
Message Queues -- Core building block for decoupling event producers from stream processors
Caching -- Strategies for caching aggregated results and deduplication state

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Ad Click Aggregator guide.

Problem Statement

Key Requirements

Functional

Event ingestion -- capture every ad click and impression with metadata including ad ID, campaign ID, timestamp, device type, and geo, without adding latency to the user redirect
Near-real-time aggregation -- produce minute-level metric rollups (clicks, impressions, CTR) available within 30 seconds of the underlying events
Multi-dimensional querying -- let advertisers slice metrics by campaign, ad, device, geography, and custom time ranges from a dashboard
Historical retention -- store and serve aggregated analytics for up to two years with hourly and daily rollup granularities
Exactly-once counting -- deduplicate retried or replayed events so that each click increments counters precisely once

Non-Functional

Scalability -- handle peak ingestion of 5 million events per second with the ability to horizontally add capacity during traffic spikes
Latency -- dashboard queries over recent data return within 500 milliseconds at the 99th percentile
Reliability -- tolerate individual broker, worker, or storage node failures without losing events or producing incorrect counts
Cost efficiency -- keep storage costs manageable by compacting older data into progressively coarser granularities and cheaper tiers

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Streaming Pipeline Design and Exactly-Once Semantics

Hints to consider:

Consider how Kafka partitioning by ad or campaign ID enables parallel consumption, and what happens when a partition is skewed by a viral campaign
Explore how Apache Flink checkpoints and barriers provide exactly-once state updates even when workers crash mid-window
Think about how you handle late-arriving events that land after the window has already been flushed -- watermarks, allowed lateness, and side outputs
Discuss idempotent sink writes so that replaying a Kafka offset range does not double-count aggregates

2. Hot Key and Skew Mitigation

Hints to consider:

Salt hot keys by appending a random suffix so that a single campaign spreads across multiple partitions, then merge partial counts during rollup
Use local pre-aggregation inside each Flink operator before shuffling to the global aggregation stage, reducing network and state pressure
Monitor partition lag to detect emerging hotspots and dynamically repartition or shed load
Consider separate fast paths for the top 1 percent of campaigns identified via real-time metrics

3. Multi-Dimensional Query Performance

Advertisers expect sub-second responses when filtering by campaign, date range, device, and geography over months of data. A naive scan of raw events is far too slow.

Hints to consider:

Pre-compute common rollups (campaign-by-day, ad-by-hour) as materialized views so that dashboard queries hit small tables instead of scanning raw rows
Use a columnar analytical database like ClickHouse or Apache Druid that compresses well and supports fast group-by queries over time-series data
Index on high-cardinality dimensions like ad ID and campaign ID while partitioning tables by date for efficient range pruning
Cache the most recently requested dashboard panels in Redis with a TTL matching the aggregation freshness window

4. Tiered Storage and Cost Control

Storing two years of minute-level data for every ad is prohibitively expensive. You need a strategy that balances query speed against storage cost.

Hints to consider:

Keep the last 7 days of minute-level aggregates in a hot store like ClickHouse for real-time dashboards
Run nightly compaction jobs that roll minute data into hourly aggregates and move them to a warm tier, freeing hot storage
Archive data older than 90 days into Parquet files on S3, queryable via Athena or Spark for ad-hoc historical analysis
Use TTL-based auto-expiration in the hot tier to avoid manual cleanup and keep storage predictable

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive -- Deduplication and Windowed Aggregation

Step 4: Address Secondary Concerns

Related Learning

Deepen your understanding of the patterns used in this problem:

Ad Click Aggregator -- Full walkthrough of a real-time ad analytics pipeline with architecture diagrams and deep dives
Top-K Videos -- Similar stream-processing and windowed-counting patterns applied to ranking problems
Distributed Counters -- Techniques for scaling write-heavy counting workloads across distributed systems
Message Queues -- Core building block for decoupling event producers from stream processors
Caching -- Strategies for caching aggregated results and deduplication state