The full backward DAG puts values, gradients, and ReLU gates in one place. It makes the live branch and the blocked branch visible together.

highlighted = computed this step

Every gradient on the graph

The backward DAG annotates each node with its forward value and its gradient. The ReLU derivative labels are visible at the two gates.

ReLU(z1)=1,ReLU(z2)=0\operatorname{ReLU}'(z_1)=1,\quad \operatorname{ReLU}'(z_2)=0
Gradient-annotated DAGEvery node value and gradient is recomputed from the shown graph.Gradient-annotated DAGEvery node value and gradient is recomputed from the shown graph.w11=1w12=1w21=1w22=-1ReLUReLUv1=1v2=1target=3x1x=1grad=-2x2x=2grad=-2z1val=2grad=-2z2val=-1grad=0h1val=2grad=-2h2val=0grad=-2yhatval=2grad=-2Lval=1grad=1b1=-1b2=0c=0z=0 convention: ReLU'(0)=0ReLU'(z1)=1ReLU'(z2)=0parameter grads: dv1=-4, dv2=0, dc=-2, dw11=-2, dw12=-4, db1=-2, dw21=0, dw22=0, db2=0

The dead branch stays zero

Because dL/dz2=0, every parameter on that branch has gradient 0.

dw21=dw22=db2=0dw_{21}=dw_{22}=db_2=0
Gradient-annotated DAGEvery node value and gradient is recomputed from the shown graph.Gradient-annotated DAGEvery node value and gradient is recomputed from the shown graph.w11=1w12=1w21=1w22=-1ReLUReLUv1=1v2=1target=3x1x=1grad=-2x2x=2grad=-2z1val=2grad=-2z2val=-1grad=0h1val=2grad=-2h2val=0grad=-2yhatval=2grad=-2Lval=1grad=1b1=-1b2=0c=0z=0 convention: ReLU'(0)=0ReLU'(z1)=1ReLU'(z2)=0parameter grads: dv1=-4, dv2=0, dc=-2, dw11=-2, dw12=-4, db1=-2, dw21=0, dw22=0, db2=0

Input gradients are exact too

The same chain rule gives dL/dx1=-2 and dL/dx2=-2. They are included to show the whole reverse pass, not just the parameters.

dLdx1=2,dLdx2=2\frac{dL}{dx_1}=-2,\quad \frac{dL}{dx_2}=-2
Gradient-annotated DAGEvery node value and gradient is recomputed from the shown graph.Gradient-annotated DAGEvery node value and gradient is recomputed from the shown graph.w11=1w12=1w21=1w22=-1ReLUReLUv1=1v2=1target=3x1x=1grad=-2x2x=2grad=-2z1val=2grad=-2z2val=-1grad=0h1val=2grad=-2h2val=0grad=-2yhatval=2grad=-2Lval=1grad=1b1=-1b2=0c=0z=0 convention: ReLU'(0)=0ReLU'(z1)=1ReLU'(z2)=0parameter grads: dv1=-4, dv2=0, dc=-2, dw11=-2, dw12=-4, db1=-2, dw21=0, dw22=0, db2=0