MicromOne: Neural Networks Made Simple Understanding Cost Functions, Forward & Backward Pass

Artificial Intelligence (AI) has come a long way, and at the heart of it lies the neural network - the core engine that powers everything from self-driving cars to personalized recommendations. But for many beginners, terms like forward pass, backward pass, and cost function can sound overwhelming.

In this post, we'll break down these concepts in a simple, logical way. Whether you're a student, a data science enthusiast, or a developer entering the AI space, this guide will help you build a solid foundation.

The Workflow of a Neural Network

Before diving into cost functions, let's understand how a neural network actually works - step by step.

Imagine you want a neural network to predict whether an email is spam or not.

Step 1: Forward Pass

This is the first phase where the input (email features) is passed through the network:

Each layer computes a weighted sum of its inputs.
It applies an activation function to introduce non-linearity.
The final output layer gives a prediction - say, a probability like 0.85 for spam.

The forward pass ends with a predicted output.

Step 2: Cost Function Calculation

Once we have a prediction, we need to measure how good or bad it is compared to the actual label (spam or not spam). That's where the cost function comes in - it tells us how wrong the prediction was.

A higher cost = more error.

Step 3: Backward Pass (Backpropagation)

The cost value is then propagated backward through the network:

Each neuron calculates its contribution to the total error using derivatives.
The gradients (rate of change) are computed for each weight.
Using these gradients, the network updates its weights using an optimizer (e.g., stochastic gradient descent).

This is the learning step. Over many iterations (epochs), the network gets better at minimizing the cost.

What Is a Cost Function?

A cost function (also called a loss function) is a mathematical function used to measure the error between the predicted output and the actual target value.

In other words, it answers the question:

"How far off is the network's prediction from the truth?"

Purpose of the Cost Function:

Acts as a feedback signal
Guides weight updates during backpropagation
Helps the model learn from mistakes

A well-chosen cost function is critical - using the wrong one can lead to poor training results.

Common Cost Functions (with Examples)

Let's look at the most widely used cost functions based on the type of machine learning problem.

1. Binary Classification → Log Loss (Binary Cross-Entropy)

Used when the output is 0 or 1 (e.g., spam detection, tumor yes/no).

Formula:

Loss=−[y⋅log(p)+(1−y)⋅log(1−p)]\text{Loss} = -[y \cdot \log(p) + (1 - y) \cdot \log(1 - p)]Loss=−[y⋅log(p)+(1−y)⋅log(1−p)]

Where:

yyy: true label (0 or 1)
ppp: predicted probability

Penalizes confident wrong predictions heavily
Suitable for sigmoid outputs

2. Multi-Class Classification → Cross-Entropy Loss

Used when there are more than two classes (e.g., classifying digits 0-9).

Formula:

Loss=−∑i=1Cyilog(pi)\text{Loss} = -\sum_{i=1}^{C} y_i \log(p_i)Loss=−i=1∑Cyilog(pi)

Where:

yiy_iyi: actual label (one-hot encoded)
pip_ipi: predicted probability for class iii
CCC: total number of classes

Works with softmax activation in the final layer
Common in image and text classification

3. Regression → Mean Squared Error (MSE)

Used when the output is a continuous number (e.g., price prediction).

Formula:

MSE=1n∑i=1n(yi−y^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2MSE=n1i=1∑n(yi−y^i)2

Penalizes larger errors more than smaller ones
Sensitive to outliers

4. Regression → Mean Absolute Error (MAE)

Alternative to MSE for regression.

Formula:

MAE=1n∑i=1n∣yi−y^i∣\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|MAE=n1i=1∑n∣yi−y^i∣

More robust to outliers
Less smooth optimization surface than MSE

Activation Functions (Bonus)

While not cost functions, activation functions play a crucial role during the forward pass. They introduce non-linearity, allowing the network to model complex patterns.

Popular ones include:

Sigmoid → for binary classification
ReLU → fast and effective for hidden layers
Softmax → for multi-class classification (used with cross-entropy)

Optimizers: From Error to Learning

After calculating the cost, we use an optimizer to update the weights. These include:

Stochastic Gradient Descent (SGD)
Adam Optimizer (adaptive learning rate)
RMSprop, Adagrad, etc.

The optimizer uses the gradient of the cost function to minimize the loss over time.

Summary Table

Task Type	Activation Output	Cost Function	Common Use Cases
Binary Classification	Sigmoid	Log Loss	Spam detection, medical diagnosis
Multi-Class Classification	Softmax	Cross-Entropy Loss	Handwriting recognition, NLP
Regression	Linear	MSE / MAE	Forecasting, stock prices

MicromOne

Pagine

Neural Networks Made Simple Understanding Cost Functions, Forward & Backward Pass

The Workflow of a Neural Network

Step 1: Forward Pass

Step 2: Cost Function Calculation

Step 3: Backward Pass (Backpropagation)

What Is a Cost Function?

Purpose of the Cost Function:

Common Cost Functions (with Examples)

1. Binary Classification → Log Loss (Binary Cross-Entropy)

2. Multi-Class Classification → Cross-Entropy Loss

3. Regression → Mean Squared Error (MSE)

4. Regression → Mean Absolute Error (MAE)

Activation Functions (Bonus)

Optimizers: From Error to Learning

Summary Table

Post più popolari