Attention uses the row weights to form a weighted sum of value vectors. The contribution pattern is exact, while the multi-entry softmax weights remain named.

highlighted = computed this step

Weighting the values

After softmax, the attention output is a weighted sum of value vectors. Which positions contribute is fixed by the exact mask.

outputi=jisoftmax(Si)jVj\text{output}_i=\sum_{j\le i}\operatorname{softmax}(S_i)_j V_j
Named weights and valuesThe output structure uses named softmax weights.Named weights and valuesThe output structure uses named softmax weights.exact scores, exact causal mask, named softmaxdisplayed Q,K integer vectors are the score source; causal mask keeps j<=iQ1=(1,0); Q2=(0,1); Q3=(1,1)K1=(1,0); K2=(0,1); K3=(1,1)score S_ij=Qi·Kj with causal maskK1K2K3Q1Q2Q31maskedmasked01masked112row softmax over unmasked scoresrow 1: scores [1]weights 1row 2: scores [0,1]weights e^0/(e^0+e^1), e^1/(e^0+e^1)row 3: scores [1,1,2]weights e^1/(e^1+e^1+e^2), e^1/(e^1+e^1+e^2), e^2/(e^1+e^1+e^2)softmax is named: no decimal attention weights are pinned

Structurally named output

Because the multi-entry weights are named softmax symbols, the output row is also structurally named. The structure is exact; the numeric softmax weights are not pinned as decimals.

exact structure+named weights\text{exact structure}+\text{named weights}
Named weights and valuesThe output structure uses named softmax weights.Named weights and valuesThe output structure uses named softmax weights.exact scores, exact causal mask, named softmaxdisplayed Q,K integer vectors are the score source; causal mask keeps j<=iQ1=(1,0); Q2=(0,1); Q3=(1,1)K1=(1,0); K2=(0,1); K3=(1,1)score S_ij=Qi·Kj with causal maskK1K2K3Q1Q2Q31maskedmasked01masked112row softmax over unmasked scoresrow 1: scores [1]weights 1row 2: scores [0,1]weights e^0/(e^0+e^1), e^1/(e^0+e^1)row 3: scores [1,1,2]weights e^1/(e^1+e^1+e^2), e^1/(e^1+e^1+e^2), e^2/(e^1+e^1+e^2)softmax is named: no decimal attention weights are pinned

Summary

This keeps the honesty boundary visible: exact score and mask mechanics feed a named weighted sum.

named weightV\sum \text{named weight}\cdot V
Named weights and valuesThe output structure uses named softmax weights.Named weights and valuesThe output structure uses named softmax weights.exact scores, exact causal mask, named softmaxdisplayed Q,K integer vectors are the score source; causal mask keeps j<=iQ1=(1,0); Q2=(0,1); Q3=(1,1)K1=(1,0); K2=(0,1); K3=(1,1)score S_ij=Qi·Kj with causal maskK1K2K3Q1Q2Q31maskedmasked01masked112row softmax over unmasked scoresrow 1: scores [1]weights 1row 2: scores [0,1]weights e^0/(e^0+e^1), e^1/(e^0+e^1)row 3: scores [1,1,2]weights e^1/(e^1+e^1+e^2), e^1/(e^1+e^1+e^2), e^2/(e^1+e^1+e^2)softmax is named: no decimal attention weights are pinned