Each weight gradient is the upstream gradient times the input feeding that weight. The branch with zero upstream gradient contributes exact zeroes.

highlighted = computed this step

First hidden-unit gradients

For the first hidden unit, dL/dz1=-2. Multiplying by x1 and x2 gives dL/dw11=-2 and dL/dw12=-4.

dLdw11=21=2,dLdw12=22=4\frac{dL}{dw_{11}}=-2\cdot1=-2,\quad \frac{dL}{dw_{12}}=-2\cdot2=-4
Weight gradientsParameter gradients are recomputed by the chain rule.Weight gradientsParameter gradients are recomputed by the chain rule.one reverse-chain step (all exact)quantityrulevaluedL/dyhat2*(2-3)-2dL/dv1-2*2-4dL/dv2-2*00dL/dcdL/dyhat-2dL/dh1-2*1-2dL/dh2-2*1-2ReLU'(z1)z1=2>01ReLU'(z2)z2=-1<=00dL/dz1-2*1-2dL/dz2-2*00dL/dw11-2*1-2dL/dw12-2*2-4dL/db1dL/dz1-2dL/dw210*10dL/dw220*20dL/db2dL/dz20

Biases and the blocked unit

The first bias gradient is dL/db1=-2. Since dL/dz2=0, the second hidden unit has zero weight and bias gradients.

dLdb1=2,dLdw21=dLdw22=dLdb2=0\frac{dL}{db_1}=-2,\quad \frac{dL}{dw_{21}}=\frac{dL}{dw_{22}}=\frac{dL}{db_2}=0
Weight gradientsParameter gradients are recomputed by the chain rule.Weight gradientsParameter gradients are recomputed by the chain rule.one reverse-chain step (all exact)quantityrulevaluedL/dyhat2*(2-3)-2dL/dv1-2*2-4dL/dv2-2*00dL/dcdL/dyhat-2dL/dh1-2*1-2dL/dh2-2*1-2ReLU'(z1)z1=2>01ReLU'(z2)z2=-1<=00dL/dz1-2*1-2dL/dz2-2*00dL/dw11-2*1-2dL/dw12-2*2-4dL/db1dL/dz1-2dL/dw210*10dL/dw220*20dL/db2dL/dz20

Summary

The parameter-gradient register is exact: dw11=-2, dw12=-4, db1=-2, and the dead branch entries are all 0.

dw11=2,dw12=4,db1=2dw_{11}=-2,\quad dw_{12}=-4,\quad db_1=-2
Weight gradientsParameter gradients are recomputed by the chain rule.Weight gradientsParameter gradients are recomputed by the chain rule.one reverse-chain step (all exact)quantityrulevaluedL/dyhat2*(2-3)-2dL/dv1-2*2-4dL/dv2-2*00dL/dcdL/dyhat-2dL/dh1-2*1-2dL/dh2-2*1-2ReLU'(z1)z1=2>01ReLU'(z2)z2=-1<=00dL/dz1-2*1-2dL/dz2-2*00dL/dw11-2*1-2dL/dw12-2*2-4dL/db1dL/dz1-2dL/dw210*10dL/dw220*20dL/db2dL/dz20