" MicromOne: Demystifying Machine Learning: How to Calculate Precision and Recall

Pagine

Demystifying Machine Learning: How to Calculate Precision and Recall


The Foundation: The Confusion Matrix

To evaluate a model, we first plot its predictions against the actual truth using a table called a Confusion Matrix.
Let's look at the data collected for the classification of Gerhard Schröder:
  • True Positives (TP) = 1: The model correctly identified 1 image of Schröder.
  • False Positives (FP) = 4: The model mistakenly labeled 4 images as Schröder when they were actually someone else (Ariel Sharon).
  • False Negatives (FN) = 25: The model missed 25 images of Schröder, incorrectly labeling them as other people ($7 + 14 + 4$).

1. Precision: How Reliable is the Model?

Precision answers the question: When the model claims an image shows Schröder, how often is it actually correct?
To find this, we divide the correctly predicted positive results by the total number of positive predictions made by the model.
$$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}$$
$$\text{Precision} = \frac{1}{1 + 4} = \frac{1}{5} = 0.20 \text{ (or } 20\%)$$
What this means: Only 20% of the images that the model flagged as "Schröder" were actually him. The remaining 80% were false alarms.

2. Recall: How Much Did the Model Miss?

Recall (also known as Sensitivity) answers the question: Out of all the actual images of Schröder in the dataset, how many did the model manage to find?
To calculate this, we divide the correctly predicted positive results by the total number of actual positives that existed.
$$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}$$
$$\text{Recall} = \frac{1}{1 + 25} = \frac{1}{26} \approx 0.038 \text{ (or } 3.8\%)$$
What this means: The model is struggling heavily with detection. It missed over 96% of the actual Schröder images present in the test pool.

Key Takeaway for Your AI Projects

While this specific model shows low performance (scoring 20% in Precision and 3.8% in Recall), tracking these metrics is the exact path to improvement.
  • High Precision is crucial when false alarms are costly (like spam filters).
  • High Recall is critical when missing a target is dangerous (like medical diagnoses).



    import numpy as np
    Import pandas as pd

    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import confusion_matrix, precision_score, recall_score, accuracy_score
    from sklearn.model_selection import train_test_split


    # reproducibility
    np.random.seed(42)


    # load dataset
    df = pd.read_csv('./admissions.csv')


    # target variable
    y = df['admit']


    # one-hot encoding for categorical feature
    df = pd.get_dummies(df, columns=['prestige'], drop_first=True)


    # feature set
    X = df[['gre', 'gpa', 'prestige_2', 'prestige_3', 'prestige_4']]


    # train/test split
    X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.10, random_state=42
    )


    # model
    log_mod = LogisticRegression(max_iter=1000)
    log_mod.fit(X_train, y_train)


    # predictions
    y_preds = log_mod.predict(X_test)


    # metrics
    print("Precision:", precision_score(y_test, y_preds))
    print("Recall:", recall_score(y_test, y_preds))
    print("Accuracy:", accuracy_score(y_test, y_preds))


    # confusion matrix
    print("Confusion Matrix:\n", confusion_matrix(y_test, y_preds))