Level: Unknown Level
Round: Phone Screen · Type: Coding · Difficulty: 7/10 · Duration: 60 min · Interviewer: Unfriendly
Topics: Machine Learning, Transformers, BERT, Tokenization, Data Structures, Algorithms
Location: Los Gatos, CA
Interview date: 2025-05-15
Got offer: False
I interviewed with a recruiter for a Machine Learning Engineer position. The technical phone screen was with an interviewer who asked many questions about my projects for the first 25 minutes. After that, the interviewer asked transformer-related questions.
I was asked about sentence piece, what tokenizers BERT and Transformer use, the components of a Transformer block, and the architectural differences from Transformer to Llama/Qwen. I was also asked about the benefits of these differences (MoE, RMSNorm, rotary positional encoder).
The coding question I got was to convert a Transformer dictionary from a list of layer1.attention… format to a dictionary format. I needed to put similar layers in the same layer.
`python
[ "layer1.attention.weight", "layer1.attention.bias", "layer2.attention.weight", "layer2.attention.bias", "layer1.ffn.weight", "layer2.ffn.bias" ]
{ "layer1": { "attention": { "weight": ..., "bias": ... }, "ffn": { "weight": ... } }, "layer2": { "attention": { "weight": ..., "bias": ... }, "ffn": { "weight": ... } } } `
My approach: