Linear regression is often introduced in its simplest form: a straight line fitted to data using one independent variable. But in real applications—econometrics, machine learning, and data science—we almost always deal with multiple variables at once. This is where the matrix formulation of Ordinary Least Squares (OLS) becomes essential.
This article explains OLS using matrix notation in a clear and intuitive way, based on standard econometric lecture notes.
1. The Linear Regression Model in Matrix Form
At the core of linear regression is the assumption that the dependent variable can be written as:
[
y = X\beta + \varepsilon
]
Where:
(y) is an (n \times 1) vector of observed outcomes
(X) is an (n \times k) matrix of explanatory variables
(\beta) is a (k \times 1) vector of unknown parameters
(\varepsilon) is an (n \times 1) vector of random errors
Each row of (X) represents one observation, and each column represents a variable (including often a column of ones for the intercept).
This compact representation allows us to handle many variables without changing the structure of the model.
2. The Goal of OLS
The purpose of Ordinary Least Squares is simple:
Find the values of (\beta) that make the model fit the data as closely as possible.
More precisely, OLS chooses (\hat{\beta}) to minimize the sum of squared residuals:
[
\min_{\beta} (y - X\beta)'(y - X\beta)
]
This expression measures the total squared distance between observed values and predicted values.
3. Deriving the OLS Estimator
To minimize the loss function, we solve a system of equations known as the normal equations:
[
X'X\hat{\beta} = X'y
]
Assuming (X'X) is invertible (no perfect multicollinearity), we obtain the closed-form solution:
[
\hat{\beta} = (X'X)^{-1}X'y
]
This is one of the most important formulas in statistics and econometrics.
It tells us that OLS is not an iterative algorithm—it has an exact algebraic solution.
4. Geometric Interpretation: Projection
A powerful way to understand OLS is through geometry.
The predicted values:
[
\hat{y} = X\hat{\beta}
]
are actually the projection of (y) onto the column space of (X).
This means:
(y) is decomposed into two parts
the explained component (\hat{y})
the residuals (e = y - \hat{y})
A key property emerges:
Residuals are orthogonal to the regressors.
Mathematically:
[
X'e = 0
]
This orthogonality condition is what guarantees the optimality of OLS.
5. Key Properties of OLS Estimators
From the matrix formulation, several important properties follow naturally:
1. Residuals sum to zero (if intercept is included)
The model automatically balances over- and under-predictions.
2. Orthogonality
Residuals are uncorrelated with each column of (X).
3. Mean preservation
The average predicted value equals the average observed value:
[
\bar{y} = \overline{\hat{y}}
]
4. Best Linear Unbiased Estimator (BLUE)
Under standard assumptions (Gauss–Markov conditions), OLS is:
Linear
Unbiased
Minimum variance among linear estimators
6. Why Matrix Form Matters
The matrix formulation is not just notation—it fundamentally changes how we work with regression.
It allows:
Handling hundreds or thousands of variables efficiently
Extending regression to machine learning models
Generalizing to advanced methods like ridge regression and GLS
Connecting statistics with linear algebra and geometry
In short, matrix OLS is the bridge between classical statistics and modern data science.