Training data is the dataset actually used to fit the model.
The model learns patterns, relationships, and parameters directly from this data.
Example
from sklearn.model_selection import train_test_split
X_train, X_temp, y_train, y_temp = train_test_split(
X, y, test_size=0.3, random_state=42
)
Validation Data
Validation data is used during training to evaluate how well the model generalizes to unseen data.
It helps tune hyperparameters and detect overfitting.
Example
X_val, X_test, y_val, y_test = train_test_split(
X_temp, y_temp, test_size=0.5, random_state=42
)
Test Data
Test data is used only once, after training is complete, to evaluate final model performance.
Example
from sklearn.metrics import accuracy_score
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test accuracy:", accuracy)
Underfitting (Bias Error)
Underfitting happens when the model is too simple to capture the underlying structure of the data.
The model performs poorly on both training and validation sets.
Example
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
print("Training score:", model.score(X_train, y_train))
print("Validation score:", model.score(X_val, y_val))
Overfitting (Variance Error)
Overfitting occurs when the model is too complex and fits the training data too closely.
It performs well on training data but poorly on new data.
Example
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
model = make_pipeline(
PolynomialFeatures(degree=10),
LinearRegression()
)
model.fit(X_train, y_train)
print("Training score:", model.score(X_train, y_train))
print("Validation score:", model.score(X_val, y_val))
Early Stopping
Early stopping halts training when validation error stops improving, preventing overfitting.
Example
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(
monitor="val_loss",
patience=5,
restore_best_weights=True
)
model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=100,
callbacks=[early_stop]
)
Dropout
Dropout randomly disables neurons during training, forcing the network to learn more robust features.
Example
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(128, activation="relu"),
Dropout(0.5),
Dense(64, activation="relu"),
Dropout(0.5),
Dense(1)
])
Local Minima
A local minimum is a point where the loss is minimal in a small region, but not globally optimal.
Gradient-based optimizers can get stuck in these points.
Example
import numpy as np
def loss(x):
return x**4 - 3*x**2 + 2
x = np.linspace(-3, 3, 100)
y = loss(x)
Momentum
Momentum improves gradient descent by adding a fraction (β between 0 and 1) of the previous update to the current one.
Example
from tensorflow.keras.optimizers import SGD
optimizer = SGD(
learning_rate=0.01,
momentum=0.9
)
model.compile(
optimizer=optimizer,
loss="mse"
)
Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent updates model parameters using small subsets of the data (mini-batches), making training faster and noisier—but often more effective.
Example
model.compile(
optimizer="sgd",
loss="categorical_crossentropy",
metrics=["accuracy"]
)
model.fit(
X_train, y_train,
batch_size=32,
epochs=50
)