A causal mask controls which earlier positions can contribute. The operation is structural and exact: masked cells are excluded before softmax rather than approximated.
highlighted = computed this step
The causal mask
The causal rule keeps keys with index j less than or equal to the query index i. Cells above that diagonal are masked out before softmax.
keep j≤i
Masked cells are not numbers
A masked cell is structurally excluded, not replaced by a displayed decimal. Row 1 keeps one score, row 2 keeps two scores, and row 3 keeps three scores.
allowed counts 1,2,3
Summary
The mask is exact because it is just a position rule. It decides the row slices that the named softmax will receive next.