Machine Learning Engineer · Full Journey · Multiple Types — Amazon

Amazon — Machine Learning Engineer ✅ Passed

Level: Senior-Level

Round: Full Journey · Type: Multiple Types · Difficulty: 6/10 · Duration: 360 min · Interviewer: Neutral

Topics: Machine Learning, Statistics, Algorithms, Coding, Behavioral Questions, Transformers, RNN, K-Means, Data Preprocessing, Feature Engineering, Model Evaluation, Optimization Algorithms

Location: Seattle, WA, US

Interview date: 2025-11-21

Summary

Interview Rounds Overview

Round 1: Phone Screen (ML Knowledge & Coding)
Round 2: Virtual Onsite
Round 3: Virtual Onsite
Round 4: Virtual Onsite
Round 5: Virtual Onsite
Round 6: Virtual Onsite

Details

Phone Screen

The first part focused on ML knowledge. The interviewer didn't ask basic definitions but instead delved deeply into my projects. Questions covered model selection, training details, and evaluation. Some typical questions were:

How did I evaluate my model?
Which metrics did I use, and why?
What are the mathematical definitions of these metrics?
Is there a better evaluation process?
How would I improve with human labels?
Without labels, how would I judge the similarity of two answers in the simplest case?

The interviewer also asked about the differences between Transformers and RNNs, their pros and cons with very long inputs, why attention can capture long-range dependencies, and why RNNs can't.

Another question was how to determine if two sets of images come from the same distribution.

The second part was ML coding. I had to write K-means by hand, without requiring it to be fully runnable. I just needed to explain the steps and ensure the matrix dimensions were correct. The interviewer mentioned that they don't test LeetCode problems, but others might.

Virtual Onsite Rounds

Round 1: Statistics Application

Given two sets of user data (North America and Europe), each with multi-dimensional continuous features (e.g., dwell time, click-through rate, purchase conversion rate), how would I determine if the two groups have significant differences in their overall distribution?

Round 2: ML Deep Dive

This round continued to delve deeply into the projects I had worked on. The interviewer started with data preprocessing, asking why I chose that data cleaning method and feature engineering approach.

Then they continued to dig into evaluation, asking why I chose those metrics, what they reflected, how the validation set was divided, and whether I had considered stratified sampling.

Next, they asked which other models I would choose and had me analyze their pros and cons in terms of training time, inference latency, interpretability, and robustness to noise.

Finally, the interviewer asked from an optimization perspective about the pros and cons, and what the impact would be if I switched to Adam or RMSProp. The interviewer followed up by asking why not AdamW and in which scenarios it is more suitable.

Round 3: Coding

I was asked to do:

Merge Intervals
Top K Frequent Elements

Round 4: HM Chat

This was a chat with the Hiring Manager, covering ML trivia.

Round 5

Behavioral questions.

LeetCode similar: LeetCode 56, LeetCode 347