Weighting the Values

Attention uses the row weights to form a weighted sum of value vectors. The contribution pattern is exact, while the multi-entry softmax weights remain named.

highlighted = computed this step

After softmax, the attention output is a weighted sum of value vectors. Which positions contribute is fixed by the exact mask.

\text{output}_i=\sum_{j\le i}\operatorname{softmax}(S_i)_j V_j

Structurally named output

Because the multi-entry weights are named softmax symbols, the output row is also structurally named. The structure is exact; the numeric softmax weights are not pinned as decimals.

\text{exact structure}+\text{named weights}

Summary

This keeps the honesty boundary visible: exact score and mask mechanics feed a named weighted sum.

\sum \text{named weight}\cdot V