The finale states the boundary: exact discrete spine, named continuous magnitudes, and no claim beyond mechanics.

highlighted = computed this step

What is exact

The discrete spine is exact: token IDs, wiring, residual adds, argmax, and the output token. The output token is a.

exact spinea\text{exact spine}\rightarrow a
Transformer honesty boundaryExact discrete spine with named continuous magnitudes.Transformer honesty boundaryExact discrete spine with named continuous magnitudes.tiny transformer exact-or-named forwarddiscrete spine exact; softmax/layernorm sqrt become named only at the boundaryinput: a b; E[a]=(1,0), E[b]=(0,1), E[c]=(1,1); P0=(0,0), P1=(1,0)weights: Wq=Wk=Wv=I; MLP=I+ReLU+I; gamma=(1,1), beta=(0,0); unembed tied to Eposition 0: fully exact pathx0=(1,0); Q0=K0=V0=(1,0)score S00=1; softmax=[1] exactattn0=(1,0); residual1=(2,0)ln1 mean=1; centered=(1,-1); var=1; std=1ln1 output=(1,-1)MLP ReLU=(1,0); mlp=(1,0)residual2=(2,-1)ln2 mean=1/2; centered=(3/2,-3/2); var=9/4; std=3/2ln2 output=(1,-1)logits: a=1, b=-1, c=0argmax=a; output token=aposition 1: named softmax boundaryx1=(1,1); Q1=(1,1)K0=(1,0); K1=(1,1)scores=[1, 2]softmax=[e^1/(e^1+e^2), e^2/(e^1+e^2)]named after multi-entry softmaxordered pipelinetokens -> embed -> +pos -> attention -> +residual-> layernorm -> MLP -> +residual -> layernorm-> unembed -> logits -> argmax -> output tokenpos0 remains exact; pos1 stops at named softmaxlayernorm sqrt boundarythis transformer's 2-D layernorms are exact here; in general layernorm's square root is namednamed 3-vectorv=(1,2,3); mean=2centered=(-1,0,1)var=2/3; std=√(2/3); register=namednormalized=(-1,0,1)/√(2/3)degenerate exact 3-vectorv=(2,2,2); mean=2centered=(0,0,0)var=0; std=0; register=exactnormalized=zero-variance exact std; normalization not divided

What is named

Softmax and layernorm's square root are the named boundary. The rendered three-vector row shows the square-root case explicitly.

softmax,var=named boundary\operatorname{softmax},\sqrt{\operatorname{var}}=\text{named boundary}
Transformer honesty boundaryExact discrete spine with named continuous magnitudes.Transformer honesty boundaryExact discrete spine with named continuous magnitudes.tiny transformer exact-or-named forwarddiscrete spine exact; softmax/layernorm sqrt become named only at the boundaryinput: a b; E[a]=(1,0), E[b]=(0,1), E[c]=(1,1); P0=(0,0), P1=(1,0)weights: Wq=Wk=Wv=I; MLP=I+ReLU+I; gamma=(1,1), beta=(0,0); unembed tied to Eposition 0: fully exact pathx0=(1,0); Q0=K0=V0=(1,0)score S00=1; softmax=[1] exactattn0=(1,0); residual1=(2,0)ln1 mean=1; centered=(1,-1); var=1; std=1ln1 output=(1,-1)MLP ReLU=(1,0); mlp=(1,0)residual2=(2,-1)ln2 mean=1/2; centered=(3/2,-3/2); var=9/4; std=3/2ln2 output=(1,-1)logits: a=1, b=-1, c=0argmax=a; output token=aposition 1: named softmax boundaryx1=(1,1); Q1=(1,1)K0=(1,0); K1=(1,1)scores=[1, 2]softmax=[e^1/(e^1+e^2), e^2/(e^1+e^2)]named after multi-entry softmaxordered pipelinetokens -> embed -> +pos -> attention -> +residual-> layernorm -> MLP -> +residual -> layernorm-> unembed -> logits -> argmax -> output tokenpos0 remains exact; pos1 stops at named softmaxlayernorm sqrt boundarythis transformer's 2-D layernorms are exact here; in general layernorm's square root is namednamed 3-vectorv=(1,2,3); mean=2centered=(-1,0,1)var=2/3; std=√(2/3); register=namednormalized=(-1,0,1)/√(2/3)degenerate exact 3-vectorv=(2,2,2); mean=2centered=(0,0,0)var=0; std=0; register=exactnormalized=zero-variance exact std; normalization not divided

Summary

The discrete spine is bit-exact: tokens, wiring, residual adds, argmax, and the output token. Softmax and layernorm's square root are the named boundary. This pins the full transformer mechanics by hand on a tiny dim two model; it is not learning, not meaning, and not a real-scale model. Every real transformer runs these same steps, just bigger and in floating point.

full transformer mechanics by hand\text{full transformer mechanics by hand}
Transformer honesty boundaryExact discrete spine with named continuous magnitudes.Transformer honesty boundaryExact discrete spine with named continuous magnitudes.tiny transformer exact-or-named forwarddiscrete spine exact; softmax/layernorm sqrt become named only at the boundaryinput: a b; E[a]=(1,0), E[b]=(0,1), E[c]=(1,1); P0=(0,0), P1=(1,0)weights: Wq=Wk=Wv=I; MLP=I+ReLU+I; gamma=(1,1), beta=(0,0); unembed tied to Eposition 0: fully exact pathx0=(1,0); Q0=K0=V0=(1,0)score S00=1; softmax=[1] exactattn0=(1,0); residual1=(2,0)ln1 mean=1; centered=(1,-1); var=1; std=1ln1 output=(1,-1)MLP ReLU=(1,0); mlp=(1,0)residual2=(2,-1)ln2 mean=1/2; centered=(3/2,-3/2); var=9/4; std=3/2ln2 output=(1,-1)logits: a=1, b=-1, c=0argmax=a; output token=aposition 1: named softmax boundaryx1=(1,1); Q1=(1,1)K0=(1,0); K1=(1,1)scores=[1, 2]softmax=[e^1/(e^1+e^2), e^2/(e^1+e^2)]named after multi-entry softmaxordered pipelinetokens -> embed -> +pos -> attention -> +residual-> layernorm -> MLP -> +residual -> layernorm-> unembed -> logits -> argmax -> output tokenpos0 remains exact; pos1 stops at named softmaxlayernorm sqrt boundarythis transformer's 2-D layernorms are exact here; in general layernorm's square root is namednamed 3-vectorv=(1,2,3); mean=2centered=(-1,0,1)var=2/3; std=√(2/3); register=namednormalized=(-1,0,1)/√(2/3)degenerate exact 3-vectorv=(2,2,2); mean=2centered=(0,0,0)var=0; std=0; register=exactnormalized=zero-variance exact std; normalization not divided