Softmax converts logits into probability weights with exponentials. The expression is named instead of decimalized.
highlighted = computed this step
Probabilities: the named boundary
Softmax converts logits into probability weights using exponentials. The expression is named in the diagram and never pinned as a decimal.
softmax(ℓ)i=eℓi/j∑eℓj
Selection stays exact
Argmax and top-k use integer ordering. Only the probability mass crosses the named softmax boundary.
ordering exact; softmax named
Summary
Top-p and sampling depend on the named probabilities, so they are deferred. This book keeps the exact selection mechanics separate from softmax.
top-p and sampling deferred