" MicromOne: Neural Networks Made Simple Understanding Cost Functions, Forward & Backward Pass

Pagine

Neural Networks Made Simple Understanding Cost Functions, Forward & Backward Pass

 Artificial Intelligence (AI) has come a long way, and at the heart of it lies the neural network - the core engine that powers everything from self-driving cars to personalized recommendations. But for many beginners, terms like forward pass, backward pass, and cost function can sound overwhelming.

In this post, we'll break down these concepts in a simple, logical way. Whether you're a student, a data science enthusiast, or a developer entering the AI space, this guide will help you build a solid foundation.

The Workflow of a Neural Network

Before diving into cost functions, let's understand how a neural network actually works - step by step.

Imagine you want a neural network to predict whether an email is spam or not.

Step 1: Forward Pass

This is the first phase where the input (email features) is passed through the network:

  • Each layer computes a weighted sum of its inputs.

  • It applies an activation function to introduce non-linearity.

  • The final output layer gives a prediction - say, a probability like 0.85 for spam.

The forward pass ends with a predicted output.

Step 2: Cost Function Calculation

Once we have a prediction, we need to measure how good or bad it is compared to the actual label (spam or not spam). That's where the cost function comes in - it tells us how wrong the prediction was.

A higher cost = more error.

Step 3: Backward Pass (Backpropagation)

The cost value is then propagated backward through the network:

  • Each neuron calculates its contribution to the total error using derivatives.

  • The gradients (rate of change) are computed for each weight.

  • Using these gradients, the network updates its weights using an optimizer (e.g., stochastic gradient descent).

This is the learning step. Over many iterations (epochs), the network gets better at minimizing the cost.

What Is a Cost Function?

A cost function (also called a loss function) is a mathematical function used to measure the error between the predicted output and the actual target value.

In other words, it answers the question:

"How far off is the network's prediction from the truth?"

Purpose of the Cost Function:

  • Acts as a feedback signal

  • Guides weight updates during backpropagation

  • Helps the model learn from mistakes

A well-chosen cost function is critical - using the wrong one can lead to poor training results.

Common Cost Functions (with Examples)

Let's look at the most widely used cost functions based on the type of machine learning problem.

1. Binary Classification → Log Loss (Binary Cross-Entropy)

Used when the output is 0 or 1 (e.g., spam detection, tumor yes/no).

Formula:

Loss=−[y⋅log(p)+(1−y)⋅log(1−p)]\text{Loss} = -[y \cdot \log(p) + (1 - y) \cdot \log(1 - p)]Loss=[ylog(p)+(1y)log(1p)]

Where:

  • yyy: true label (0 or 1)

  • ppp: predicted probability

Penalizes confident wrong predictions heavily
Suitable for sigmoid outputs

2. Multi-Class Classification → Cross-Entropy Loss

Used when there are more than two classes (e.g., classifying digits 0-9).

Formula:

Loss=−∑i=1Cyilog(pi)\text{Loss} = -\sum_{i=1}^{C} y_i \log(p_i)Loss=i=1Cyilog(pi)

Where:

  • yiy_iyi: actual label (one-hot encoded)

  • pip_ipi: predicted probability for class iii

  • CCC: total number of classes

Works with softmax activation in the final layer
Common in image and text classification

3. Regression → Mean Squared Error (MSE)

Used when the output is a continuous number (e.g., price prediction).

Formula:

MSE=1n∑i=1n(yi−y^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2MSE=n1i=1n(yiy^i)2

Penalizes larger errors more than smaller ones
Sensitive to outliers

4. Regression → Mean Absolute Error (MAE)

Alternative to MSE for regression.

Formula:

MAE=1n∑i=1n∣yi−y^i∣\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|MAE=n1i=1nyiy^i

More robust to outliers
Less smooth optimization surface than MSE

Activation Functions (Bonus)

While not cost functions, activation functions play a crucial role during the forward pass. They introduce non-linearity, allowing the network to model complex patterns.

Popular ones include:

  • Sigmoid → for binary classification

  • ReLU → fast and effective for hidden layers

  • Softmax → for multi-class classification (used with cross-entropy)

Optimizers: From Error to Learning

After calculating the cost, we use an optimizer to update the weights. These include:

  • Stochastic Gradient Descent (SGD)

  • Adam Optimizer (adaptive learning rate)

  • RMSprop, Adagrad, etc.

The optimizer uses the gradient of the cost function to minimize the loss over time.

Summary Table

Task TypeActivation OutputCost FunctionCommon Use Cases
Binary ClassificationSigmoidLog LossSpam detection, medical diagnosis
Multi-Class ClassificationSoftmaxCross-Entropy LossHandwriting recognition, NLP
RegressionLinearMSE / MAEForecasting, stock prices