The first derivative comes from the squared-error loss. From there, the output layer and hidden activations receive exact gradients by multiplication.

highlighted = computed this step

Backward from the loss

Squared error gives dL/dyhat=-2 because yhat=2 and y=3.

dLdy^=2(y^y)=2(2    3)=2\frac{dL}{d\hat y}=2(\hat y-y)=2(2\;-\;3)=-2
Back from the outputThe output layer gradients are exact products.Back from the outputThe output layer gradients are exact products.one reverse-chain step (all exact)quantityrulevaluedL/dyhat2*(2-3)-2dL/dv1-2*2-4dL/dv2-2*00dL/dcdL/dyhat-2dL/dh1-2*1-2dL/dh2-2*1-2

Output-layer parameter gradients

The output weights multiply the hidden activations. Since h1=2 and h2=0, the exact gradients are dL/dv1=-4, dL/dv2=0, and dL/dc=-2.

dLdv1=22=4,dLdv2=20=0\frac{dL}{dv_1}=-2\cdot2=-4,\quad \frac{dL}{dv_2}=-2\cdot0=0
Back from the outputThe output layer gradients are exact products.Back from the outputThe output layer gradients are exact products.one reverse-chain step (all exact)quantityrulevaluedL/dyhat2*(2-3)-2dL/dv1-2*2-4dL/dv2-2*00dL/dcdL/dyhat-2dL/dh1-2*1-2dL/dh2-2*1-2

Back into the hidden activations

The same output gradient flows through v1 and v2. Here dL/dh1=-2 and dL/dh2=-2.

dLdh1=2,dLdh2=2\frac{dL}{dh_1}=-2,\quad \frac{dL}{dh_2}=-2
Back from the outputThe output layer gradients are exact products.Back from the outputThe output layer gradients are exact products.one reverse-chain step (all exact)quantityrulevaluedL/dyhat2*(2-3)-2dL/dv1-2*2-4dL/dv2-2*00dL/dcdL/dyhat-2dL/dh1-2*1-2dL/dh2-2*1-2