Stream Processing Fundamentals

[ OK ] 370 — full content available

[ INFO ] category: Coding · Domain Specific difficulty: medium freq: medium first seen: 2026-01-13

[MEDIUM][DOMAIN SPECIFIC][MEDIUM]StreamingData EngineeringReal-time Processing

$ cat problem.md

Design a real-time stream processing pipeline that ingests clickstream events from Kafka, computes per-minute counts of each event type, and writes the results to Snowflake. Your solution must handle late-arriving data (up to 5 min), guarantee exactly-once semantics, and scale to 50 k events/sec. You may assume events arrive as JSON with fields: event_id (string), event_type (string), event_time (epoch ms), user_id (string). Deliver a high-level design plus pseudo-code for the core windowed aggregation using tumbling 1-minute windows, watermarks, and idempotent writes into Snowflake. Be prepared to discuss how you would tune checkpoints, state back-end choice, and parallelism to meet the throughput target.

user@intervues:~/snowflake$