Level: Senior-Level
Round: Phone Screen · Type: Coding · Difficulty: 6/10 · Duration: 60 min · Interviewer: Neutral
Topics: Machine Learning, Position Embedding, KV Cache, KNN, FFN, L2 Distance
Location: San Francisco Bay Area
Interview date: 2026-01-25
Question: Debugging a Transformer model (position embedding initialization, mask setting, missing loss.backward(), projection layer dimensions). Follow-up involved KV cache implementation.
Question: Implementing One-NN (basic KNN) and then implementing it using a basic feed-forward network (FFN) and activation layer. The key is to convert L2 distance into a linear transformation (Y = WX + b) and then use softmax activation.
The first round involved debugging a Transformer model with the following issues:
The follow-up involved KV cache, where I had to insert and modify the position embedding during attention calculation, passing the parameters correctly.
For the ML puzzle, I had to:
`
W1 = 2.0 * X.T # columns are 2*x_i b1 = -np.sum(X * X, axis=1) # -||x_i||^2 `