Practice/Uber/Design a real-time analytics dashboard for restaurant orders
Design a real-time analytics dashboard for restaurant orders
System DesignOptional
Problem Statement
Design a system that allows restaurant owners to view real-time aggregated metrics for their orders, such as total orders, revenue, average preparation time, and popular items over configurable time windows. The dashboard should update in near real time as new orders come in and support historical comparisons.
Restaurant owners on platforms like Uber Eats need operational visibility: how many orders came in the last hour, what is the current average prep time, which menu items are trending, and how does today compare to the same day last week. The system must ingest a continuous stream of order events, compute windowed aggregations, and serve dashboard queries with low latency.
Interviewers at Uber ask this to test whether you can design a streaming analytics pipeline that computes real-time aggregations, serves low-latency dashboard queries, and handles the tension between freshness and query performance. They probe your understanding of stream processing, materialized views, and time-window management.
Key Requirements
Functional
- Real-time metrics -- restaurant owners view live metrics (order count, revenue, average prep time) that update within seconds of new orders
- Time-windowed views -- metrics are available for configurable windows: last 15 minutes, last hour, today, last 7 days
- Item-level analytics -- owners see which menu items are most popular by order count and revenue
- Historical comparison -- owners compare current metrics against the same period in previous weeks
Non-Functional
- Scalability -- support hundreds of thousands of restaurants, each receiving up to thousands of orders per day during peak hours
- Reliability -- dashboard remains available even if the streaming pipeline has a brief delay; show stale data with a freshness indicator
- Latency -- dashboard page loads in under 500 ms; real-time metrics update within 10 seconds of order completion
- Consistency -- eventual consistency acceptable; metrics may lag by seconds but should converge to accurate totals
What Interviewers Focus On
Based on real interview experiences at Uber, these are the areas interviewers probe most deeply:
1. Streaming Aggregation Pipeline
Order events must be continuously aggregated into per-restaurant, per-time-window metrics. Interviewers want to see how you compute running totals without re-scanning all historical data.
Hints to consider:
- Consume order events from Kafka partitioned by restaurant_id to enable parallel, ordered processing per restaurant
- Use Flink with event-time tumbling windows (1 minute, 15 minutes, 1 hour) to compute aggregates incrementally
- Maintain counters (order count, revenue sum, prep time sum, item counts) per restaurant per window in Flink's keyed state
- Emit window results to a serving store (Redis or a time-series database) as each window closes
2. Serving Layer for Dashboard Queries
Dashboard queries must return quickly, even for restaurants with high order volume. Interviewers evaluate your materialized view strategy.
Hints to consider:
- Materialize pre-computed aggregates per restaurant per time window in Redis hashes for sub-millisecond reads
- For short windows (last 15 minutes), use rolling window updates by aggregating the most recent minute-level buckets
- For longer windows (today, last 7 days), use pre-aggregated daily totals stored in a time-series database
- Cache dashboard responses per restaurant with TTLs matching the update frequency (10-30 seconds)
3. Historical Comparison
Comparing today's metrics to the same day last week requires efficient access to historical aggregates.
Hints to consider:
- Store daily aggregate snapshots in a columnar or time-series database (ClickHouse, TimescaleDB) partitioned by restaurant_id and date
- Pre-compute week-over-week deltas during the daily aggregation job
- Cache historical comparison data since it does not change once the day has passed
- Support flexible comparison periods (same day last week, same day last month) via parameterized queries
4. Late Events and Accuracy
Order events may arrive out of order (prep time updates after the order completion event). Interviewers probe how you maintain accuracy.
Hints to consider:
- Use event-time processing with allowed lateness to accept late updates and re-emit corrected window results
- Design the serving layer to handle incremental corrections gracefully (overwrite, not accumulate)
- For metrics like average prep time, store both the sum and count so late updates can adjust both values
- Accept that very late events (hours late) may only be reflected in the daily aggregate, not the real-time view