Data Engineer · Onsite · Multiple Types — Meta

Meta — Data Engineer ✅ Passed

Level: Senior-Level

Round: Onsite · Type: Multiple Types · Difficulty: 7/10 · Duration: 240 min · Interviewer: Neutral

Topics: ETL, Data Modeling, SQL, Python, Product Sense, Data Visualization, Real-time Streaming Systems, Batch Systems, Big Data, Behavioral Questions, Leadership Principles

Location: San Francisco Bay Area

Interview date: 2022-02-15

Summary

Interview Rounds Overview

Round 1: ETL Interview (Real
Round 2: Data modeling
Round 3: ETL Interview (Batch system)
Round 4: Behavioral

Details

This onsite interview process focused on ETL, data modeling, and system design within a big data context. I was told that although code was to be written during the technical interviews, it would not be executed. The recruiter was helpful and provided thorough preparation for the interviews.

Round 1: ETL in Real-time Stream System (Photo Upload Stream Process)

The focus was on designing important metrics and the end-to-end ETL process.
- Q1: What metrics would you use to evaluate if a feature in the photo upload application is effective or not? What data would you collect?
- Q2: Design the schema/columns for data modeling for both the source data and the target (output) data.
- Q3: Given a schema:
` session_id Action timestamp 1 login 2021-11-29:00:02:01 1 click on post 2021-11-29:01:03:03 2 upload photos 2021-11-29:01:12:01 3 tag friends 2021-11-29:02:38:15 3 tag friends 2021-11-29:03:02:01

Find the average time taken between each step/action above. For duplicate actions (upload photo again, use the first upload photo time)

SELECT * , DATEDIFF(minutes, lag(timestamp) over (PARTITION BY user_id order by timestamp), timestamp) as inactivity_time FROM events `
- Code Question: Similar to the SQL question above, but with input data like [session_id, action/step_id, start_ts]. The task was to compute and update the running average time spent on each step for every new stream of data coming into the system, assuming infinite memory.

Round 2: Data Modeling (Ride Share Product like Doordash/Uber)

Key design guidelines included scalability, maintainability, normalization/denormalization, and performance.
- Q1: Product sense question related to the product.
- Q2: Create a data mart, including data dimension tables to analyze the product (star schema preferred).
- Q3: Query based on the schema created previously to find out some metrics.
  - Calculate all the drivers who departed or reached the airport as the destination.
  - Calculate all the customers who only departed or reached the airport as the destination (customers who have been to other destinations don't count).
  - Discuss the parts in the table design that could have been improved.

Round 3: ETL in Batch System (Related to Meta Quarterly Results)

Question 1: Product sense question related to DAU, MAU.
Question 2: Query to find the status/count/number of different metrics given a table with new users, retention users, and timestamps.
- Find the daily new users.
- SQL question: Given a table with user_id, last_login_time, and prev_login_before_last_login columns, find:
  - Yesterday's DAU, last 7 days, and last 28 days.
  - Continuous, returning, and churned users.
- Users have more than one device. Given the following data:
` details = {'iPhone':[0, 1,0,1,0,1,0] ,'Android':[1, 0,0,0,0,0,1] ,'Web':[0, 0,1,0,0,0,0]} rollups= {'overall':['iphone', 'Android', 'Web'] ,'Mobile':['iPhone', 'Android']}

Question: find last 7 days for overall , mobile, etc

Return:

{ “overall” : [1,1,1,0,1,1,1] “Mobile”: [1, 0,0,0,0,0,1] `

Round 4: Behavioral (Follow the STAR Framework)

What draws you to Meta DE?
What is DE?
Projects you've worked on (try to relate the experience to Meta products).
Leadership questions:
- Describe a time when you completed a project from end to end.
- What have you done in a cross-functional, collaborative environment?
- What is the impact of your projects?
- What trade-offs did you have to make?
- What characteristics and attributes do you think are important as a leader?
Questions about autonomy/prioritization.
Describe a situation where you influenced/or being influenced by others using data.
How do you handle conflict?
How do you plan to succeed in Meta in the first year?