The Gradient-Annotated Graph - Backpropagation, Exactly

The full backward DAG puts values, gradients, and ReLU gates in one place. It makes the live branch and the blocked branch visible together.

highlighted = computed this step

Every gradient on the graph

The backward DAG annotates each node with its forward value and its gradient. The ReLU derivative labels are visible at the two gates.

\operatorname{ReLU}'(z_1)=1,\quad \operatorname{ReLU}'(z_2)=0

Because dL/dz2=0, every parameter on that branch has gradient 0.

dw_{21}=dw_{22}=db_2=0

The same chain rule gives dL/dx1=-2 and dL/dx2=-2. They are included to show the whole reverse pass, not just the parameters.

\frac{dL}{dx_1}=-2,\quad \frac{dL}{dx_2}=-2