The full grid combines scores, mask, and named softmax in one validated scene. It keeps the exact and named registers visible at the same time.
The attention grid
The full grid shows the three pieces together: exact dot-product scores, the causal mask, and named row softmax.
scores→mask→named softmax
Reading the grid
The first row has weight 1 because it has only one allowed key. The other rows keep their exponential forms so no softmax float is shown.
row 1 weight 1
Summary
The diagram is self-contained: Q and K are visible, scores recompute from them, mask cells are structural, and softmax stays named.
displayed source→validated grid