Gradients for the Weights - Backpropagation, Exactly

Each weight gradient is the upstream gradient times the input feeding that weight. The branch with zero upstream gradient contributes exact zeroes.

highlighted = computed this step

First hidden-unit gradients

For the first hidden unit, dL/dz1=-2. Multiplying by x1 and x2 gives dL/dw11=-2 and dL/dw12=-4.

\frac{dL}{dw_{11}}=-2\cdot1=-2,\quad \frac{dL}{dw_{12}}=-2\cdot2=-4

The first bias gradient is dL/db1=-2. Since dL/dz2=0, the second hidden unit has zero weight and bias gradients.

\frac{dL}{db_1}=-2,\quad \frac{dL}{dw_{21}}=\frac{dL}{dw_{22}}=\frac{dL}{db_2}=0

The parameter-gradient register is exact: dw11=-2, dw12=-4, db1=-2, and the dead branch entries are all 0.

dw_{11}=-2,\quad dw_{12}=-4,\quad db_1=-2