Find the best 1-feature split by scanning candidate thresholds and choosing the lowest weighted Gini impurity. For each threshold t: split into left (X<t) and right (X≥t), compute weighted Gini. Library: DecisionTreeClassifier(max_depth=1) reads tree_.threshold[0]. RESULT: best threshold.

By hand

With scikit-learn

DecisionTreeClassifier(max_depth=1) fits a single split and stores the chosen threshold in clf.tree_.threshold[0].

naive.py
X = [1, 2, 3, 5, 7, 8]
y = [0, 0, 0, 1, 1, 1]
n = len(X)
thresholds = [2.5, 4.0, 6.0]
best_t = None
best_wg = 1.0
for t in thresholds:
    left_y  = [y[i] for i in range(n) if X[i] <  t]
    right_y = [y[i] for i in range(n) if X[i] >= t]
    nl = len(left_y)
    nr = len(right_y)
    c0l = left_y.count(0)
    gl = 1 - (c0l / nl) ** 2 - ((nl - c0l) / nl) ** 2
    c0r = right_y.count(0)
    gr = 1 - (c0r / nr) ** 2 - ((nr - c0r) / nr) ** 2
    wg = (nl / n) * gl + (nr / n) * gr
    if wg < best_wg:
        best_wg = wg
        best_t = t
print('RESULT:', best_t)
library.py
from sklearn.tree import DecisionTreeClassifier
from dalib.display import set_display
set_display()

X = [[1], [2], [3], [5], [7], [8]]
y = [0, 0, 0, 1, 1, 1]
clf = DecisionTreeClassifier(max_depth=1, random_state=0)
clf.fit(X, y)
threshold = round(float(clf.tree_.threshold[0]), 4)
print('sklearn threshold:', threshold)
print('RESULT:', threshold)
sklearn threshold: 4.0
RESULT: 4.0

Implementation notes

  • sklearn's split condition is feature ≤ threshold (left), > threshold (right). The naive uses strict < — equivalent here because no training sample equals 4.0 (the midpoint (3+5)/2=4.0 lies between feature values).
  • Candidate thresholds are midpoints between adjacent sorted feature values. Three candidates [2.5, 4.0, 6.0] cover the informative region; symmetric outer candidates give higher impurity and are omitted.
  • Weighted Gini = (n_left/n)·Gini_left + (n_right/n)·Gini_right. At t=4.0 both children are pure (Gini=0) so weighted Gini=0 — the global minimum.
  • Cross-reference: gini-impurity (this chapter) for the per-node formula; decision-stump-predict (this chapter) applies the found threshold.