Softmax converts logits into probability weights with exponentials. The expression is named instead of decimalized.

highlighted = computed this step

Probabilities: the named boundary

Softmax converts logits into probability weights using exponentials. The expression is named in the diagram and never pinned as a decimal.

softmax()i=ei/jej\operatorname{softmax}(\ell)_i=e^{\ell_i}/\sum_j e^{\ell_j}
Softmax named boundarySelection is exact; softmax probabilities are named.Softmax named boundarySelection is exact; softmax probabilities are named.exact selections with named softmaxdisplayed integer logits are the source; ties break by lowest vocab indexstep 1 logits: a=3, b=1, c=2tokenlogitranka31b13c22ranked order: a > c > btop-2: a, cgreedy argmax: asoftmax probabilities are named, not pinned decimalssoftmax: a:e^3/(e^3+e^1+e^2); b:e^1/(e^3+e^1+e^2); c:e^2/(e^3+e^1+e^2)

Selection stays exact

Argmax and top-k use integer ordering. Only the probability mass crosses the named softmax boundary.

ordering exact; softmax named\text{ordering exact; softmax named}
Softmax named boundarySelection is exact; softmax probabilities are named.Softmax named boundarySelection is exact; softmax probabilities are named.exact selections with named softmaxdisplayed integer logits are the source; ties break by lowest vocab indexstep 1 logits: a=3, b=1, c=2tokenlogitranka31b13c22ranked order: a > c > btop-2: a, cgreedy argmax: asoftmax probabilities are named, not pinned decimalssoftmax: a:e^3/(e^3+e^1+e^2); b:e^1/(e^3+e^1+e^2); c:e^2/(e^3+e^1+e^2)

Summary

Top-p and sampling depend on the named probabilities, so they are deferred. This book keeps the exact selection mechanics separate from softmax.

top-p and sampling deferred\text{top-p and sampling deferred}
Softmax named boundarySelection is exact; softmax probabilities are named.Softmax named boundarySelection is exact; softmax probabilities are named.exact selections with named softmaxdisplayed integer logits are the source; ties break by lowest vocab indexstep 1 logits: a=3, b=1, c=2tokenlogitranka31b13c22ranked order: a > c > btop-2: a, cgreedy argmax: asoftmax probabilities are named, not pinned decimalssoftmax: a:e^3/(e^3+e^1+e^2); b:e^1/(e^3+e^1+e^2); c:e^2/(e^3+e^1+e^2)