You are given a deep feed-forward neural network with L fully-connected layers. Each layer i has weight matrix W_i ∈ ℝ^{d_i × d_{i−1}} and bias vector b_i ∈ ℝ^{d_i}. The forward pass for layer i is
z_i = W_i h_{i−1} + b_i
h_i = σ(z_i)
where h_0 = x (the input) and σ is an element-wise activation function (sigmoid, tanh, or ReLU). During back-propagation the gradient of the loss L with respect to the pre-activation z_i is
δ_i = (W_{i+1}^T δ_{i+1}) ⊙ σ′(z_i)
and the gradient with respect to the parameters is
∂L/∂W_i = δ_i h_{i−1}^T
∂L/∂b_i = δ_i
Implement a function
gradient_metrics(network, x, y, loss_fn)
that returns a dictionary containing
You must compute these quantities by performing an exact forward and backward pass on the given mini-batch of inputs x and labels y. Do not use automatic mixed precision or any gradient-clipping utilities. The network is provided as a list of (weight, bias, activation) tuples.