For a full example answer with detailed architecture diagrams and deep dives, see our Ad Click Aggregator guide.
Design a system that ingests billions of advertising events -- clicks, impressions, and conversions -- every day and aggregates them into near-real-time metrics that advertisers can query from a dashboard. Think of the analytics backend behind platforms like Google Ads or Meta Ads Manager, where campaign owners see click-through rates, spend, and conversion counts updating within seconds of an event occurring.
The central engineering challenge is building a pipeline that can sustain millions of events per second while guaranteeing each click is counted exactly once, even when producers retry or network partitions cause duplicate delivery. Raw events must flow through windowed aggregation stages that roll data up into minute, hour, and day granularities, all with a maximum staleness of about 30 seconds from event time to dashboard visibility.
Beyond freshness, the system must support multi-dimensional slicing -- advertisers filter by campaign, ad group, creative, device type, and geography -- and retain queryable history for up to two years. This means you need a tiered storage strategy where recent fine-grained data lives in fast stores and older data is compacted into cheaper columnar or object storage without sacrificing query flexibility.
Based on real interview experiences, these are the areas interviewers probe most deeply:
The pipeline must convert raw events into aggregated counts with strict freshness guarantees. Interviewers want to see how you chain ingestion, deduplication, windowed aggregation, and sink stages while preserving correctness across failures.
Hints to consider:
A single popular ad campaign can receive orders of magnitude more clicks than the median, turning one partition into a bottleneck. Interviewers look for awareness of this skew and concrete countermeasures.
Hints to consider:
Advertisers expect sub-second responses when filtering by campaign, date range, device, and geography over months of data. A naive scan of raw events is far too slow.
Hints to consider:
Storing two years of minute-level data for every ad is prohibitively expensive. You need a strategy that balances query speed against storage cost.
Hints to consider:
Ask the interviewer about expected event volume per second, the maximum acceptable staleness between an event and its dashboard visibility, and which dimensions advertisers need to filter by. Confirm whether approximate counts are acceptable for certain metrics or if exact counts are required. Clarify the retention window and whether the system must support backfilling historical data after a pipeline fix.
Sketch a pipeline: edge redirect servers emit click and impression events to Kafka topics partitioned by campaign ID with optional key salting. A Flink streaming job consumes events, deduplicates using a Redis-backed lookup on event IDs, and aggregates counts into tumbling one-minute windows. Flink sinks write minute-level aggregates into ClickHouse for real-time queries. Separate batch jobs roll minute data into hourly and daily tables, eventually archiving to S3 in Parquet format. An API layer fronts ClickHouse and the archive tier, routing queries to the appropriate store based on the requested time range.
Walk through a single click event end to end. The user clicks an ad, the redirect service logs the click with a unique event_id and forwards the user to the landing page. The event lands in Kafka. A Flink consumer reads it, checks Redis for the event_id with a 24-hour TTL -- if found, the event is discarded as a duplicate. If new, it is added to the in-flight window state. At the window boundary (every 60 seconds), Flink flushes the aggregate (campaign, ad, minute bucket, count) to ClickHouse using an idempotent upsert keyed on the composite of those fields. If the Flink job restarts, it replays from the last checkpoint offset. Because the sink is idempotent, replayed events produce the same aggregate rows without inflation.
Backfill and correction: When a pipeline bug is discovered, replay the affected Kafka offset range through a parallel Flink job writing to a staging table, then swap it into production via an atomic table rename in ClickHouse.
Monitoring: Track end-to-end latency from event timestamp to ClickHouse row visibility, Flink checkpoint duration, Kafka consumer lag per partition, and Redis deduplication cache hit rate. Alert when freshness exceeds 30 seconds or lag grows beyond a threshold.
Cost optimization: Use ClickHouse table TTLs to auto-drop minute-level partitions after 7 days. Schedule nightly Spark jobs that compact hourly tables into daily rollups before archiving to S3, reducing hot storage by over 90 percent.
Deepen your understanding of the patterns used in this problem: