Meta — Data Engineer ✅ Passed
Level: Senior-Level
Round: Full Journey · Type: Multiple Types · Difficulty: 6/10 · Duration: 300 min · Interviewer: Unfriendly
Topics: Python, SQL, ETL, Data Modeling, Product Sense, Behavioral Questions
Location: San Francisco Bay Area
Interview date: 2020-03-15
Summary
Interview Rounds Overview
- Round 1: Phone Screen
- Round 2: Virtual Onsite
- Round 3: Virtual Onsite
- Round 4: Virtual Onsite
- Round 5: Virtual Onsite
Details
I interviewed for a Data Engineer position. Here's my experience:
Phone Screen:
- Python:
- Given a string, find the number of occurrences of a given character.
- Replace all None values in a list with the preceding number.
- Find the different words between two lists.
- Given a list with numbers appearing different times, find how many numbers to add to each number to make the appearing times equal.
- Find the kth largest key in a dictionary.
- SQL Tables: promotion, sales, product, promotion class
- Find the percentage of products with both low_fat flag and other flags.
- Find products with single media.
- Find the percentage of transactions on the first and last day of promotion compared to total transactions.
- Find total sale units for different product families, valid promotion sale units/total sale units, invalid promotion sale units/total sale units
I explained my approach and managed to write half of the code. I don't remember the exact details, but it involved finding products that were not sold in different product families.
I performed well in the phone screen. The recruiter gave me positive feedback, saying it was the best feedback they had seen. The advice was to ask questions and explain my thought process before coding, examine the data in each table, and explain the purpose of each step while coding.
Virtual Onsite (VO):
- Round 1 (ETL): FB messenger
- Product sense: Identify potential reasons for a sudden drop in the number of sign-ins.
- SQL: Given a log table, find the number of times each user signed in, the number of messages sent per day, the first sign-in date, the total number of messages sent since the first sign-in date, and whether the user was active today.
- Python: Given several tables, write SQL statements in Python to update them daily.
- Round 2 (ETL): FB news feed
- Product sense: How to determine if a user has effectively read a post.
- SQL: Given a log table, find the number of effective reads for each post in each session.
- Python: Write the solution to the SQL problem using Python.
- Round 3 (Data Model): FB box (similar to Dropbox)
- Product sense: How to determine if the product is successful.
- Design a schema for analysis.
- SQL: Select all users who have only uploaded photos.
- Round 4 (Ownership): Purely behavioral questions.
I felt I didn't perform well in the second ETL round due to poor signal quality, which led to me repeatedly asking the interviewer to repeat the questions. I felt like the interviewer became impatient, which made me nervous. My product sense answers were disorganized. In the data model round, I spent too much time on the initial discussion, leaving me with insufficient time to complete only one SQL question. I also didn't fully address the problem with the schema design and was continuously questioned about the details, which left me feeling overwhelmed.
Even with those difficulties, the result was better than I expected. My advice is to communicate my thoughts, even when unsure. During the data model round, when I couldn't answer a question, I would state the approaches I considered, even if I wasn't sure they would solve the problem.