Practice/Uber/Design a real-time analytics dashboard for restaurant orders

Design a real-time analytics dashboard for restaurant orders

System DesignOptional

Problem Statement

Design a system that allows restaurant owners to view real-time aggregated metrics for their orders, such as total orders, revenue, average preparation time, and popular items over configurable time windows. The dashboard should update in near real time as new orders come in and support historical comparisons.

Restaurant owners on platforms like Uber Eats need operational visibility: how many orders came in the last hour, what is the current average prep time, which menu items are trending, and how does today compare to the same day last week. The system must ingest a continuous stream of order events, compute windowed aggregations, and serve dashboard queries with low latency.

Interviewers at Uber ask this to test whether you can design a streaming analytics pipeline that computes real-time aggregations, serves low-latency dashboard queries, and handles the tension between freshness and query performance. They probe your understanding of stream processing, materialized views, and time-window management.

Key Requirements

Functional

Real-time metrics -- restaurant owners view live metrics (order count, revenue, average prep time) that update within seconds of new orders
Time-windowed views -- metrics are available for configurable windows: last 15 minutes, last hour, today, last 7 days
Item-level analytics -- owners see which menu items are most popular by order count and revenue
Historical comparison -- owners compare current metrics against the same period in previous weeks

Non-Functional

Scalability -- support hundreds of thousands of restaurants, each receiving up to thousands of orders per day during peak hours
Reliability -- dashboard remains available even if the streaming pipeline has a brief delay; show stale data with a freshness indicator
Latency -- dashboard page loads in under 500 ms; real-time metrics update within 10 seconds of order completion
Consistency -- eventual consistency acceptable; metrics may lag by seconds but should converge to accurate totals

What Interviewers Focus On

Based on real interview experiences at Uber, these are the areas interviewers probe most deeply:

1. Streaming Aggregation Pipeline

Order events must be continuously aggregated into per-restaurant, per-time-window metrics. Interviewers want to see how you compute running totals without re-scanning all historical data.

Hints to consider:

Consume order events from Kafka partitioned by restaurant_id to enable parallel, ordered processing per restaurant
Use Flink with event-time tumbling windows (1 minute, 15 minutes, 1 hour) to compute aggregates incrementally
Maintain counters (order count, revenue sum, prep time sum, item counts) per restaurant per window in Flink's keyed state
Emit window results to a serving store (Redis or a time-series database) as each window closes

2. Serving Layer for Dashboard Queries

Dashboard queries must return quickly, even for restaurants with high order volume. Interviewers evaluate your materialized view strategy.

Hints to consider:

Materialize pre-computed aggregates per restaurant per time window in Redis hashes for sub-millisecond reads
For short windows (last 15 minutes), use rolling window updates by aggregating the most recent minute-level buckets
For longer windows (today, last 7 days), use pre-aggregated daily totals stored in a time-series database
Cache dashboard responses per restaurant with TTLs matching the update frequency (10-30 seconds)

3. Historical Comparison

Comparing today's metrics to the same day last week requires efficient access to historical aggregates.

Hints to consider:

Store daily aggregate snapshots in a columnar or time-series database (ClickHouse, TimescaleDB) partitioned by restaurant_id and date
Pre-compute week-over-week deltas during the daily aggregation job
Cache historical comparison data since it does not change once the day has passed
Support flexible comparison periods (same day last week, same day last month) via parameterized queries

4. Late Events and Accuracy

Order events may arrive out of order (prep time updates after the order completion event). Interviewers probe how you maintain accuracy.

Hints to consider:

Use event-time processing with allowed lateness to accept late updates and re-emit corrected window results
Design the serving layer to handle incremental corrections gracefully (overwrite, not accumulate)
For metrics like average prep time, store both the sum and count so late updates can adjust both values
Accept that very late events (hours late) may only be reflected in the daily aggregate, not the real-time view

Practice/Uber/Design a real-time analytics dashboard for restaurant orders

Design a real-time analytics dashboard for restaurant orders

System DesignOptional

Problem Statement

Key Requirements

Functional

Real-time metrics -- restaurant owners view live metrics (order count, revenue, average prep time) that update within seconds of new orders
Time-windowed views -- metrics are available for configurable windows: last 15 minutes, last hour, today, last 7 days
Item-level analytics -- owners see which menu items are most popular by order count and revenue
Historical comparison -- owners compare current metrics against the same period in previous weeks

Non-Functional

Scalability -- support hundreds of thousands of restaurants, each receiving up to thousands of orders per day during peak hours
Reliability -- dashboard remains available even if the streaming pipeline has a brief delay; show stale data with a freshness indicator
Latency -- dashboard page loads in under 500 ms; real-time metrics update within 10 seconds of order completion
Consistency -- eventual consistency acceptable; metrics may lag by seconds but should converge to accurate totals

What Interviewers Focus On

Based on real interview experiences at Uber, these are the areas interviewers probe most deeply:

1. Streaming Aggregation Pipeline

Order events must be continuously aggregated into per-restaurant, per-time-window metrics. Interviewers want to see how you compute running totals without re-scanning all historical data.

Hints to consider:

Consume order events from Kafka partitioned by restaurant_id to enable parallel, ordered processing per restaurant
Use Flink with event-time tumbling windows (1 minute, 15 minutes, 1 hour) to compute aggregates incrementally
Maintain counters (order count, revenue sum, prep time sum, item counts) per restaurant per window in Flink's keyed state
Emit window results to a serving store (Redis or a time-series database) as each window closes

2. Serving Layer for Dashboard Queries

Dashboard queries must return quickly, even for restaurants with high order volume. Interviewers evaluate your materialized view strategy.

Hints to consider:

Materialize pre-computed aggregates per restaurant per time window in Redis hashes for sub-millisecond reads
For short windows (last 15 minutes), use rolling window updates by aggregating the most recent minute-level buckets
For longer windows (today, last 7 days), use pre-aggregated daily totals stored in a time-series database
Cache dashboard responses per restaurant with TTLs matching the update frequency (10-30 seconds)

3. Historical Comparison

Comparing today's metrics to the same day last week requires efficient access to historical aggregates.

Hints to consider:

Store daily aggregate snapshots in a columnar or time-series database (ClickHouse, TimescaleDB) partitioned by restaurant_id and date
Pre-compute week-over-week deltas during the daily aggregation job
Cache historical comparison data since it does not change once the day has passed
Support flexible comparison periods (same day last week, same day last month) via parameterized queries

4. Late Events and Accuracy

Order events may arrive out of order (prep time updates after the order completion event). Interviewers probe how you maintain accuracy.

Hints to consider:

Use event-time processing with allowed lateness to accept late updates and re-emit corrected window results
Design the serving layer to handle incremental corrections gracefully (overwrite, not accumulate)
For metrics like average prep time, store both the sum and count so late updates can adjust both values
Accept that very late events (hours late) may only be reflected in the daily aggregate, not the real-time view