What the Whole Transformer Is and Is Not - A Transformer, End to End

The finale states the boundary: exact discrete spine, named continuous magnitudes, and no claim beyond mechanics.

highlighted = computed this step

What is exact

The discrete spine is exact: token IDs, wiring, residual adds, argmax, and the output token. The output token is a.

\text{exact spine}\rightarrow a

What is named

Softmax and layernorm's square root are the named boundary. The rendered three-vector row shows the square-root case explicitly.

\operatorname{softmax},\sqrt{\operatorname{var}}=\text{named boundary}

Summary

The discrete spine is bit-exact: tokens, wiring, residual adds, argmax, and the output token. Softmax and layernorm's square root are the named boundary. This pins the full transformer mechanics by hand on a tiny dim two model; it is not learning, not meaning, and not a real-scale model. Every real transformer runs these same steps, just bigger and in floating point.

\text{full transformer mechanics by hand}