The Attention Grid

The full grid combines scores, mask, and named softmax in one validated scene. It keeps the exact and named registers visible at the same time.

highlighted = computed this step

The full grid shows the three pieces together: exact dot-product scores, the causal mask, and named row softmax.

\text{scores}\rightarrow\text{mask}\rightarrow\text{named softmax}

Reading the grid

The first row has weight 1 because it has only one allowed key. The other rows keep their exponential forms so no softmax float is shown.

\text{row }1\text{ weight }1

The diagram is self-contained: Q and K are visible, scores recompute from them, mask cells are structural, and softmax stays named.

\text{displayed source}\rightarrow\text{validated grid}