The Whole Transformer, by Hand

This capstone traces a complete tiny transformer forward pass. It shows the entire exact-or-named pipeline before focusing on each stage.

highlighted = computed this step

This book traces a complete tiny transformer forward pass and the next token by exact arithmetic. The configuration is vocab a,b,c, dim 2, one head, one block, and tied unembed.

\text{tiny transformer: dim }2

Ordered pipeline

The ordered flow is tokens, embeddings, position table, attention, residual add, layernorm, MLP, residual add, layernorm, unembed, logits, argmax.

\text{tokens}\rightarrow\text{logits}\rightarrow\operatorname{argmax}

Summary

The capstone combines the exact forward pass, exact ReLU block, named softmax boundary, named layernorm square-root boundary, and greedy output token.

\text{exact spine plus named boundaries}