Precision, Recall, and F1 - Machine Learning Basics

Compute precision, recall, and F1 from given TP, FP, FN counts. Precision = TP/(TP+FP); recall = TP/(TP+FN); F1 = 2PR/(P+R). Library: sklearn precision_score, recall_score, f1_score on the same y_true/y_pred. RESULT: (precision, recall, f1) rounded.

By hand

With scikit-learn

precision_score, recall_score, f1_score each take y_true and y_pred; default average='binary' treats label 1 as positive.

naive.py

tp = 2
fp = 1
fn = 2
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * precision * recall / (precision + recall)
print('RESULT:', (round(precision, 4), round(recall, 4), round(f1, 4)))

library.py

from sklearn.metrics import precision_score, recall_score, f1_score
from dalib.display import set_display
set_display()

y_true = [1, 0, 1, 1, 0, 1]
y_pred = [1, 1, 0, 1, 0, 0]
p = round(float(precision_score(y_true, y_pred)), 4)
r = round(float(recall_score(y_true, y_pred)), 4)
f = round(float(f1_score(y_true, y_pred)), 4)
print('RESULT:', (p, r, f))

RESULT: (0.6667, 0.5, 0.5714)

Implementation notes

F1 is computed from the raw (unrounded) precision and recall so floating- point rounding doesn't compound. The rounded values appear only in the final print.
Precision answers "of all predicted positives, how many were correct?"; recall answers "of all actual positives, how many did we find?". F1 is their harmonic mean — harmonic mean punishes a metric that is very high on one axis and very low on the other, more than the arithmetic mean would.
Accuracy = (TP+TN)/n = 3/6 = 0.5 on this data (lower than F1 = 0.57), showing that accuracy can understate classifier performance when classes are imbalanced. Cross-reference: accuracy-score and confusion-matrix-counts (this chapter).