Backpropagation asks for the loss derivative with respect to each parameter. This lesson reuses the exact forward values and sets up the reverse chain rule.

highlighted = computed this step

The gradient question

Backprop asks for dL/d(each weight): how the loss changes with each parameter. The chain rule computes those quantities exactly from the already-shown graph.

dLd(parameter)by the chain rule\frac{dL}{d(\text{parameter})}\quad\text{by the chain rule}
Forward recapThe loss node supplies the first exact gradient.Forward recapThe loss node supplies the first exact gradient.forward pass (all exact)quantityrulevaluez11*1 + 1*2 - 12h1ReLU(z1)2z21*1 - 1*2 + 0-1h2ReLU(z2)0yhat1*2 + 1*0 + 02L(yhat - 3)^21

Forward values we reuse

The forward pass ended with yhat=2 and L=1. Those exact node values are the inputs to the backward pass.

y^=2,L=1\hat y=2,\quad L=1
Forward recapThe loss node supplies the first exact gradient.Forward recapThe loss node supplies the first exact gradient.forward pass (all exact)quantityrulevaluez11*1 + 1*2 - 12h1ReLU(z1)2z21*1 - 1*2 + 0-1h2ReLU(z2)0yhat1*2 + 1*0 + 02L(yhat - 3)^21

Summary

The backward pass starts at the loss and moves in reverse. The next step computes the first gradient, dL/dyhat.

Ly^hzw,bL\rightarrow\hat y\rightarrow h\rightarrow z\rightarrow w,b
Forward recapThe loss node supplies the first exact gradient.Forward recapThe loss node supplies the first exact gradient.forward pass (all exact)quantityrulevaluez11*1 + 1*2 - 12h1ReLU(z1)2z21*1 - 1*2 + 0-1h2ReLU(z2)0yhat1*2 + 1*0 + 02L(yhat - 3)^21