The merge sequence records the ordered tokens created by repeated pair counting. Each round displays its source segmentation so the counts remain inspectable.

highlighted = computed this step

The merge sequence

The sequence diagram shows the source corpus for each round, the pair counts, the chosen merge, and the resulting segmentation. That makes each count derivable from visible rows.

corpuscountsmergenew corpus\text{corpus}\rightarrow\text{counts}\rightarrow\text{merge}\rightarrow\text{new corpus}
BPE merge sequenceEach round displays its source segmentation and merge.BPE merge sequenceEach round displays its source segmentation and merge.exact BPE merge sequencedeterministic merges: count pairs, choose max, tie-break lexicographicround 1source corpus:hug freq=3 symbols=h u gpug freq=2 symbols=p u gcounts: (h,u)=3, (p,u)=2, (u,g)=5merge (u,g)->ug count=5after: hug=h ug; pug=p uground 2source corpus:hug freq=3 symbols=h ugpug freq=2 symbols=p ugcounts: (h,ug)=3, (p,ug)=2merge (h,ug)->hug count=3after: hug=hug; pug=p ug

Two deterministic rounds

Round 1 merges (u,g) with count 5. Round 2 merges (h,ug) with count 3.

r1:(u,g),r2:(h,ug)r_1:(u,g),\quad r_2:(h,ug)
BPE merge sequenceEach round displays its source segmentation and merge.BPE merge sequenceEach round displays its source segmentation and merge.exact BPE merge sequencedeterministic merges: count pairs, choose max, tie-break lexicographicround 1source corpus:hug freq=3 symbols=h u gpug freq=2 symbols=p u gcounts: (h,u)=3, (p,u)=2, (u,g)=5merge (u,g)->ug count=5after: hug=h ug; pug=p uground 2source corpus:hug freq=3 symbols=h ugpug freq=2 symbols=p ugcounts: (h,ug)=3, (p,ug)=2merge (h,ug)->hug count=3after: hug=hug; pug=p ug

Summary

BPE's state is the current segmentation plus the ordered merge list. No probabilities or decimals are needed for this toy run.

ordered merge list=[(u,g),(h,ug)]\text{ordered merge list}=[(u,g),(h,ug)]
BPE merge sequenceEach round displays its source segmentation and merge.BPE merge sequenceEach round displays its source segmentation and merge.exact BPE merge sequencedeterministic merges: count pairs, choose max, tie-break lexicographicround 1source corpus:hug freq=3 symbols=h u gpug freq=2 symbols=p u gcounts: (h,u)=3, (p,u)=2, (u,g)=5merge (u,g)->ug count=5after: hug=h ug; pug=p uground 2source corpus:hug freq=3 symbols=h ugpug freq=2 symbols=p ugcounts: (h,ug)=3, (p,ug)=2merge (h,ug)->hug count=3after: hug=hug; pug=p ug