The Foundation: The Confusion Matrix
To evaluate a model, we first plot its predictions against the actual truth using a table called a Confusion Matrix.
Let's look at the data collected for the classification of Gerhard Schröder:
- True Positives (TP) = 1: The model correctly identified 1 image of Schröder.
- False Positives (FP) = 4: The model mistakenly labeled 4 images as Schröder when they were actually someone else (Ariel Sharon).
- False Negatives (FN) = 25: The model missed 25 images of Schröder, incorrectly labeling them as other people ($7 + 14 + 4$).
1. Precision: How Reliable is the Model?
Precision answers the question: When the model claims an image shows Schröder, how often is it actually correct?
To find this, we divide the correctly predicted positive results by the total number of positive predictions made by the model.
$$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}$$
$$\text{Precision} = \frac{1}{1 + 4} = \frac{1}{5} = 0.20 \text{ (or } 20\%)$$
What this means: Only 20% of the images that the model flagged as "Schröder" were actually him. The remaining 80% were false alarms.
2. Recall: How Much Did the Model Miss?
Recall (also known as Sensitivity) answers the question: Out of all the actual images of Schröder in the dataset, how many did the model manage to find?
To calculate this, we divide the correctly predicted positive results by the total number of actual positives that existed.
$$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}$$
$$\text{Recall} = \frac{1}{1 + 25} = \frac{1}{26} \approx 0.038 \text{ (or } 3.8\%)$$
What this means: The model is struggling heavily with detection. It missed over 96% of the actual Schröder images present in the test pool.
Key Takeaway for Your AI Projects
While this specific model shows low performance (scoring 20% in Precision and 3.8% in Recall), tracking these metrics is the exact path to improvement.
- High Precision is crucial when false alarms are costly (like spam filters). High Recall is critical when missing a target is dangerous (like medical diagnoses).
import numpy as np
Import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, precision_score, recall_score, accuracy_score
from sklearn.model_selection import train_test_split
# reproducibility
np.random.seed(42)
# load dataset
df = pd.read_csv('./admissions.csv')
# target variable
y = df['admit']
# one-hot encoding for categorical feature
df = pd.get_dummies(df, columns=['prestige'], drop_first=True)
# feature set
X = df[['gre', 'gpa', 'prestige_2', 'prestige_3', 'prestige_4']]
# train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.10, random_state=42
)
# model
log_mod = LogisticRegression(max_iter=1000)
log_mod.fit(X_train, y_train)
# predictions
y_preds = log_mod.predict(X_test)
# metrics
print("Precision:", precision_score(y_test, y_preds))
print("Recall:", recall_score(y_test, y_preds))
print("Accuracy:", accuracy_score(y_test, y_preds))
# confusion matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_preds))