Level: Senior-Level
Round: Onsite · Type: Multiple Types · Difficulty: 7/10 · Duration: 240 min · Interviewer: Neutral
Topics: ETL, Data Modeling, SQL, Python, Product Sense, Data Visualization, Real-time Streaming Systems, Batch Systems, Big Data, Behavioral Questions, Leadership Principles
Location: San Francisco Bay Area
Interview date: 2022-02-15
This onsite interview process focused on ETL, data modeling, and system design within a big data context. I was told that although code was to be written during the technical interviews, it would not be executed. The recruiter was helpful and provided thorough preparation for the interviews.
Round 1: ETL in Real-time Stream System (Photo Upload Stream Process)
The focus was on designing important metrics and the end-to-end ETL process.
` session_id Action timestamp 1 login 2021-11-29:00:02:01 1 click on post 2021-11-29:01:03:03 2 upload photos 2021-11-29:01:12:01 3 tag friends 2021-11-29:02:38:15 3 tag friends 2021-11-29:03:02:01
Find the average time taken between each step/action above. For duplicate actions (upload photo again, use the first upload photo time)
SELECT * , DATEDIFF(minutes, lag(timestamp) over (PARTITION BY user_id order by timestamp), timestamp) as inactivity_time FROM events `
[session_id, action/step_id, start_ts]. The task was to compute and update the running average time spent on each step for every new stream of data coming into the system, assuming infinite memory.Round 2: Data Modeling (Ride Share Product like Doordash/Uber)
Key design guidelines included scalability, maintainability, normalization/denormalization, and performance.
Q1: Product sense question related to the product.
Q2: Create a data mart, including data dimension tables to analyze the product (star schema preferred).
Q3: Query based on the schema created previously to find out some metrics.
Round 3: ETL in Batch System (Related to Meta Quarterly Results)
Question 1: Product sense question related to DAU, MAU.
Question 2: Query to find the status/count/number of different metrics given a table with new users, retention users, and timestamps.
user_id, last_login_time, and prev_login_before_last_login columns, find:
` details = {'iPhone':[0, 1,0,1,0,1,0] ,'Android':[1, 0,0,0,0,0,1] ,'Web':[0, 0,1,0,0,0,0]} rollups= {'overall':['iphone', 'Android', 'Web'] ,'Mobile':['iPhone', 'Android']}
Question: find last 7 days for overall , mobile, etc
Return:
{ “overall” : [1,1,1,0,1,1,1] “Mobile”: [1, 0,0,0,0,0,1] `
Round 4: Behavioral (Follow the STAR Framework)