[ OK ] 21db08b8-60d2-41fd-8900-6f3a385d6dc5 — full writeup
[ INFO ] category: Behavioral · Multiple Types difficulty: 7 freq: first seen: 2026-04-10
[7][MULTIPLE TYPES]SQLPythonETLData WarehousingScalabilitySystem DesignDatabase DesignLeadershipInfluence Without AuthorityConflict Resolution
$ cat problem.md
Meta — Data Engineer ❌ Failed
Level: Senior-Level
Round: Full Journey · Type: Multiple Types · Difficulty: 7/10 · Duration: 225 min · Interviewer: Friendly
Topics: SQL, Python, ETL, Data Warehousing, Scalability, System Design, Database Design, Leadership, Influence Without Authority, Conflict Resolution
Location: Remote
Interview date: 2022-11-01
Got offer: False
Summary
1. Leadership & Ownership (Behavioral)
The judging rubric focuses on three signals: Ownership, Scale, and Influence.
General Philosophy
- What do you think makes a good data engineer?
- How would people characterize your leadership?
Situational (STARR Method)
- Tell me about a time when you had to convince someone to your point of view.
- Tell me about a time when you brought about change in your project.
- Tell me about a time you disagreed with a coworker/manager.
- What accomplishment are you most proud of?
2. Technical Screen: Coding & ETL
SQL Challenges
- The "Social User" Problem (Hard): * Task: Given a FB-like schema, calculate the top 10 institutions that produce "social users."
- Logic: Define "social users" (e.g., users with
> average friends).
- Skills:
AVG(), HAVING, GROUP BY, subqueries, joins.
- DAU Segmentation: * Schema:
userId, first_login, last_seen, previous_last_seen, todays_date.
- Task: Calculate segments (Active + Returning + New - Churned) / Total.
- Efficiency: Use
CASE WHEN to calculate all metrics in a single table scan.
- Daily Load:
- Task: Update a table for the next day given transactional login events (
userId, login_timestamp).
- Skills:
COALESCE, full outer joins, date calculations.
Python Challenges
- Friendship Graph (Easy): * Task: Count friends for each user given a list of edges.
- Input:
[[A,B],[C,D],[B,D],[E]]
- Output:
{A: 1, B: 2, C: 1, D: 2, E: 0}
3. Product Sense & Data Modeling
Case Study: Ride-Sharing (Uber/Lyft)
- Product Sense: How would you deploy in a new region? What metrics matter for existing users?
- Modeling: Design a schema for a ride-sharing app.
- Analytics: * Calculate average wait time.
- Find users who use the app for airport services only.
Case Study: Professional Network (LinkedIn)
- Product Sense: What metrics are important?
- Modeling: Design a data model for a specific feature (e.g., Marketplace, Feed).
- Scale: How do you handle scale, partitioning, and denormalization?
Case Study: File Sharing (Dropbox)
- Product Sense: Track file uploads, storage growth, and DAU.
- ETL Anomaly: If a graph is "spiky" or Metric A increases by 10% but Metric B only by 5%, how do you diagnose the root cause?
4. System Design & Architecture
- Scenario: Monthly reports for an Advertising Department.
- Components: * Batch vs. Streaming data processing.
- Source vs. Target data modeling.
- Handling FB-scale (Petabytes).
- Data Quality: How to handle the four pillars:
- Accuracy
- Consistency
- Validity
- Completeness
Details
5. Summary of Key Resources
- Books: The Data Warehouse Toolkit (Kimball), Cracking the PM Interview.
- Practice: LeetCode (Easy/Medium), StrataScratch (Medium/Hard), DataLemur.
- Tips: * Speed: 90% completion is often a fail. You must finish.
- Environment: Coderpad (no auto-complete).
- Communication: Speak out loud. If you get a hint, use it immediately.