Through the Block

The residual adds and ReLU MLP are exact, while the layernorm example shows where a general square root becomes named.

highlighted = computed this step

The first residual add gives (2,0). In this pass, layernorm has variance 1 and std 1, so it stays exact.

\operatorname{residual}_{\text{one}}=(2,0)

Layernorm's square root

In general, layernorm crosses a square-root boundary. The rendered three-vector v=(1,2,3) has variance 2/3 and std named as the square root of that fraction.

\operatorname{var}=2/3,\quad \operatorname{std}=\sqrt{2/3}

Exact MLP and second residual

ReLU keeps the positive coordinate and zeroes the negative coordinate, giving MLP output (1,0). The second residual becomes (2,-1).

\operatorname{MLP}=(1,0),\quad r_{\text{two}}=(2,-1)