← Back to companies
[ OK ] Loaded —
[ INFO ]
$ cd
$ ls -lt
01
02
03
04
05
$ ls -lt
01
02
03
04
05
user@intervues:~/$
Design a real-time stream processing pipeline that ingests clickstream events from Kafka, computes per-minute counts of each event type, and writes the results to Snowflake. Your solution must handle late-arriving data (up to 5 min), guarantee exactly-once semantics, and scale to 50 k events/sec. You may assume events arrive as JSON with fields: event_id (string), event_type (string), event_time (epoch ms), user_id (string). Deliver a high-level design plus pseudo-code for the core windowed aggregation using tumbling 1-minute windows, watermarks, and idempotent writes into Snowflake. Be prepared to discuss how you would tune checkpoints, state back-end choice, and parallelism to meet the throughput target.