Decision Trees
Entropy and Information Gain
Entropy = −Σpₖ·log₂(pₖ) measures node impurity in bits. Information gain =
entropy(parent) − weighted average of child entropies for a proposed split.
Three inline counting loops compute each entropy. Library: NumPy helper
ent() to confirm. RESULT: information gain (rounded).
By hand
With NumPy
Helper ent() uses np.unique for counts and −np.sum(p·np.log2(p))
for entropy in bits. abs() guards against −0.0 on pure (single-class) nodes.
naive.py
import math
parent = ['A', 'A', 'B', 'B']
left = ['A', 'A', 'B']
right = ['B']
n_p = len(parent)
cnt_p = {}
for lbl in parent:
cnt_p[lbl] = cnt_p.get(lbl, 0) + 1
h_p = 0.0
for lbl in cnt_p:
p = cnt_p[lbl] / n_p
h_p = h_p - p * math.log2(p)
n_l = len(left)
cnt_l = {}
for lbl in left:
cnt_l[lbl] = cnt_l.get(lbl, 0) + 1
h_l = 0.0
for lbl in cnt_l:
p = cnt_l[lbl] / n_l
h_l = h_l - p * math.log2(p)
n_r = len(right)
cnt_r = {}
for lbl in right:
cnt_r[lbl] = cnt_r.get(lbl, 0) + 1
h_r = 0.0
for lbl in cnt_r:
p = cnt_r[lbl] / n_r
h_r = h_r - p * math.log2(p)
ig = h_p - (n_l / n_p) * h_l - (n_r / n_p) * h_r
print('RESULT:', round(ig, 4))
library.py
import numpy as np
from dalib.display import set_display
set_display()
def ent(labels):
n = len(labels)
_, counts = np.unique(labels, return_counts=True)
p = counts / n
return abs(float(-np.sum(p * np.log2(p))))
parent = ['A', 'A', 'B', 'B']
left = ['A', 'A', 'B']
right = ['B']
h_p = ent(parent)
h_l = ent(left)
h_r = ent(right)
ig = h_p - (len(left) / len(parent)) * h_l - (len(right) / len(parent)) * h_r
print('h_parent:', round(h_p, 4))
print('h_left:', round(h_l, 4))
print('h_right:', round(h_r, 4))
print('RESULT:', round(ig, 4))
h_parent: 1.0
h_left: 0.9183
h_right: 0.0
RESULT: 0.3113
Implementation notes
- Entropy=0 → pure node; entropy=log₂(k) → maximally mixed k equal classes. Binary maximum = 1 bit at 50/50; parent here is 50/50 → H=1.0.
- IG=0.3113 means this split removes 0.3113 bits of impurity. The right child is pure (H=0) but has only 1 sample, so its weight (1/4) is small.
- log₂(1)=0 in the pure child: the h_r loop computes −1·log₂(1)=0 correctly without a special-case branch.
- Weights (n_left/n, n_right/n) penalise unequal splits — a large impure child contributes more to the weighted average than a small pure one.
- Cross-reference:
gini-impurity(this chapter) for the log-free impurity alternative; the split-selection logic is the same, only the measure differs.