Linear Regression Direct Solution

Note

Updated: April 28, 2025

Step 1: Define the Cost Function

We want to minimize the error between predictions and true values.

J(w) = \frac{1}{2} \| Xw - t \|^2

✅ Meaning:

Use the rule:

\| v \|^2 = v^\top v

Expand:

\| Xw - t \|^2 = (Xw - t)^\top (Xw - t)

Distribute:

= (Xw)^\top (Xw) - (Xw)^\top t - t^\top (Xw) + t^\top t

Since $(Xw)^\top t = t^\top (Xw)$ (scalars are symmetric):

= (Xw)^\top (Xw) - 2 t^\top (Xw) + t^\top t

Recognizing standard matrix rules:

Thus:

J(w) = \frac{1}{2} \left( w^\top X^\top X w - 2 t^\top Xw + t^\top t \right)

Differentiate term-by-term:

Because we have a $\frac{1}{2}$ factor outside, it cancels out the 2’s from the derivatives.

Thus:

\nabla_w J(w) = X^\top X w - X^\top t

Setting the gradient to zero:

X^\top X w - X^\top t = 0

Rearranging:

X^\top X w = X^\top t

✅ This is called the normal equation.

Multiply both sides by $(X^\top X)^{-1}$ :

w^* = (X^\top X)^{-1} X^\top t

✅ This gives the optimal weights.

Write the cost function: $J(w) = \frac{1}{2} \|Xw - t\|^2$
Expand the square: $w^\top X^\top Xw - 2 t^\top Xw + t^\top t$
Differentiate term-by-term:
- $w^\top X^\top Xw \to 2X^\top Xw$
- $-2 t^\top Xw \to -2 X^\top t$
- $t^\top t \to 0$
Set the gradient to zero: $X^\top Xw = X^\top t$
Solve for $w$ : $w^* = (X^\top X)^{-1} X^\top t$

✅ $X^\top X$ must be invertible (i.e., $X$ has full rank).
✅ If $X^\top X$ is singular (no inverse), you can fix it by adding Ridge regularization:

w^* = (X^\top X + \lambda I)^{-1} X^\top t

✅ Linear regression has this direct solution.
Other models like logistic regression or neural networks need iterative optimization methods.

The direct closed-form solution for linear regression is:

w^* = (X^\top X)^{-1} X^\top t