Count Token Frequencies - Data Handling in Pure Python

Split a sentence into words with .split() and count how many times each word appears. The replay shows counts seeding new words at 1 and incrementing on repeats — the word 'the' appears three times and 'cat' twice.

By hand

The Pythonic way

Counter(text.split()) tokenises and tallies in one call. The result is a Counter (a dict subclass) with the same counts.

naive.py

text = 'the cat sat on the mat the cat'
words = text.split()
counts = {}
for w in words:
    if w in counts:
        counts[w] = counts[w] + 1
    else:
        counts[w] = 1
print('RESULT:', {k: counts[k] for k in sorted(counts)})

library.py

from collections import Counter
text = 'the cat sat on the mat the cat'
counts = Counter(text.split())
print('RESULT:', {k: counts[k] for k in sorted(counts)})

RESULT: {'cat': 2, 'mat': 1, 'on': 1, 'sat': 1, 'the': 3}

Implementation notes

The mechanism is identical to python-data-basics/frequency-count (ch05). The distinction is the input: ch05 counts a pre-split label list; this lesson first tokenises a raw text string with .split(). Real NLP pipelines add lowercasing and punctuation stripping before counting.
.split() with no argument splits on any whitespace run and ignores leading/trailing whitespace — equivalent to .strip().split().
Counter.most_common(n) returns the n highest-frequency tokens, useful for finding stop-words or topic keywords.