Gradient Vanishing and Exploding Problems

[ OK ] 458 — full content available

[ INFO ] category: Coding · Domain Specific difficulty: medium freq: medium first seen: 2026-01-13

[MEDIUM][DOMAIN SPECIFIC][MEDIUM]Deep LearningTrainingOptimization

$ cat problem.md

You are given a deep feed-forward neural network with L fully-connected layers. Each layer i has weight matrix W_i ∈ ℝ^{d_i × d_{i−1}} and bias vector b_i ∈ ℝ^{d_i}. The forward pass for layer i is

z_i = W_i h_{i−1} + b_i
h_i = σ(z_i)

where h_0 = x (the input) and σ is an element-wise activation function (sigmoid, tanh, or ReLU). During back-propagation the gradient of the loss L with respect to the pre-activation z_i is

δ_i = (W_{i+1}^T δ_{i+1}) ⊙ σ′(z_i)

and the gradient with respect to the parameters is

∂L/∂W_i = δ_i h_{i−1}^T
∂L/∂b_i = δ_i

Implement a function

gradient_metrics(network, x, y, loss_fn)

that returns a dictionary containing

The L2 norm of the gradient at every layer, ||∂L/∂W_i||_F for i = 1 … L.
The ratio of the largest to the smallest gradient norm across layers (the “gradient explosion ratio”).
A boolean flag indicating whether any gradient norm is below 1e-7 (vanishing) or above 1e3 (exploding).

You must compute these quantities by performing an exact forward and backward pass on the given mini-batch of inputs x and labels y. Do not use automatic mixed precision or any gradient-clipping utilities. The network is provided as a list of (weight, bias, activation) tuples.

user@intervues:~/snapchat$