Debug Transformer

[ OK ] 349 — full content available

[ INFO ] category: Coding · Domain Specific difficulty: hard freq: medium first seen: 2026-01-13

[HARD][DOMAIN SPECIFIC][MEDIUM]Machine LearningTransformersDebuggingPyTorch

$ cat problem.md

You are given a PyTorch implementation of a decoder-only Transformer model that is failing to train. The model compiles and runs, but loss plateaus and generated text is incoherent. Your task is to identify and fix exactly four bugs hidden in the provided code. The bugs are located in the scaled-dot-product attention, the causal masking logic, the residual + LayerNorm wiring, and the positional-encoding addition. You may not change the overall architecture (number of layers, heads, or hidden sizes); you may only correct the buggy lines. After each fix, re-run the training script and verify that the loss decreases and sample outputs become meaningful. Submit the four corrected lines (or line pairs) with a brief comment explaining each fix.

user@intervues:~/openai$