This Uber Machine Learning Engineer onsite question is a lightweight analytics exercise on a toy orders dataset. The interviewer gives you a small table and asks for a simple slice metric, for example, the average number of days it takes to complete an order.
Given a dataset of toy orders, calculate the average number of days it takes to complete an order. The dataset contains the following columns:
order_id: A unique identifier for each order.customer_id: A unique identifier for each customer.order_date: The date when the order was placed.completion_date: The date when the order was completed.You need to find the average number of days between the order_date and completion_date for all orders.
Example dataset:
| order_id | customer_id | order_date | completion_date | |----------|-------------|-------------|----------------| | 1 | 101 | 2022-01-01 | 2022-01-05 | | 2 | 102 | 2022-01-02 | 2022-01-07 | | 3 | 103 | 2022-01-03 | 2022-01-04 |
Expected output:
Average number of days to complete an order: 3.67
order_date and completion_date for each order.Here's a Python solution using pandas:
`python import pandas as pd
data = { 'order_id': [1, 2, 3], 'customer_id': [101, 102, 103], 'order_date': ['2022-01-01', '2022-01-02', '2022-01-03'], 'completion_date': ['2022-01-05', '2022-01-07', '2022-01-04'] }
df = pd.DataFrame(data)
df['order_date'] = pd.to_datetime(df['order_date']) df['completion_date'] = pd.to_datetime(df['completion_date'])
df['days_to_complete'] = (df['completion_date'] - df['order_date']).dt.days
average_days = df['days_to_complete'].mean()
print(f"Average number of days to complete an order: {average_days:.2f}") `
This solution calculates the difference in days between order_date and completion_date for each order, sums up all the differences, and divides by the total number of orders to get the average.