The Next Token

The final normalized vector unembeds to exact logits. Greedy argmax then selects the next token exactly.

highlighted = computed this step

Final layernorm

The second layernorm has variance 9/4 and std 3/2. Its exact output is (1,-1).

\operatorname{LN}_{\text{two}}=(1,-1)

The tied unembed dots that vector with the token embeddings. The logits are a=1, b=-1, c=0.

\ell_a=1,\ell_b=-1,\ell_c=0

Greedy decoding takes the largest logit with lowest-index tie-break. The largest logit is a, so the output token is a.

\operatorname{argmax}\{a:1,b:-1,c:0\}=a