You are training a very deep feed-forward neural network (≥20 layers) for image classification on Pinterest’s pin dataset. After a few epochs you notice that the loss stops decreasing and the classification accuracy plateaus far below expectations. Inspecting the gradients you discover that the magnitudes of gradients for the earliest layers are near zero (≈1e-7) while the later layers still receive healthy updates. Your task is to (1) explain why this vanishing-gradient phenomenon occurs in terms of back-propagation mechanics, (2) implement a minimal experiment that reproduces the problem by building a 30-layer fully-connected network with sigmoid activations and plotting per-layer gradient L2 norms after one backward pass, and (3) propose and code one architectural fix (e.g. residual connection, gated unit, or normalization layer) that restores gradient flow so that the gradient L2 norm of the first layer is at least 1 % of the last layer’s gradient L2 norm. Use PyTorch, assume input images are 224×224×3 flattened to 150 528-dim vectors, and return the gradient-norm plot data as a list of floats.