The full backward DAG puts values, gradients, and ReLU gates in one place. It makes the live branch and the blocked branch visible together.
Every gradient on the graph
The backward DAG annotates each node with its forward value and its gradient. The ReLU derivative labels are visible at the two gates.
ReLU′(z1)=1,ReLU′(z2)=0
The dead branch stays zero
Because dL/dz2=0, every parameter on that branch has gradient 0.
dw21=dw22=db2=0
Input gradients are exact too
The same chain rule gives dL/dx1=-2 and dL/dx2=-2. They are included to show the whole reverse pass, not just the parameters.
dx1dL=−2,dx2dL=−2