Backward from the Loss

The first derivative comes from the squared-error loss. From there, the output layer and hidden activations receive exact gradients by multiplication.

highlighted = computed this step

Squared error gives dL/dyhat=-2 because yhat=2 and y=3.

\frac{dL}{d\hat y}=2(\hat y-y)=2(2\;-\;3)=-2

Output-layer parameter gradients

The output weights multiply the hidden activations. Since h1=2 and h2=0, the exact gradients are dL/dv1=-4, dL/dv2=0, and dL/dc=-2.

\frac{dL}{dv_1}=-2\cdot2=-4,\quad \frac{dL}{dv_2}=-2\cdot0=0

Back into the hidden activations

The same output gradient flows through v1 and v2. Here dL/dh1=-2 and dL/dh2=-2.

\frac{dL}{dh_1}=-2,\quad \frac{dL}{dh_2}=-2