Taiju Sanagi: Experiments

Linear Regression Direct Solution

Note
Updated: April 28, 2025

Step 1: Define the Cost Function

We want to minimize the error between predictions and true values.

J(w)=12Xwt2J(w) = \frac{1}{2} \| Xw - t \|^2

✅ Meaning:

  • XwXw: predicted values
  • tt: true target values
  • XwtXw - t: error vector
  • 2\| \cdot \|^2: sum of squared errors
  • 12\frac{1}{2}: for convenient derivative later

Step 2: Expand Xwt2\| Xw - t \|^2

Use the rule:

v2=vv\| v \|^2 = v^\top v

Expand:

Xwt2=(Xwt)(Xwt)\| Xw - t \|^2 = (Xw - t)^\top (Xw - t)

Distribute:

=(Xw)(Xw)(Xw)tt(Xw)+tt= (Xw)^\top (Xw) - (Xw)^\top t - t^\top (Xw) + t^\top t

Since (Xw)t=t(Xw)(Xw)^\top t = t^\top (Xw) (scalars are symmetric):

=(Xw)(Xw)2t(Xw)+tt= (Xw)^\top (Xw) - 2 t^\top (Xw) + t^\top t

Step 3: Rewrite in Matrix Terms

Recognizing standard matrix rules:

  • (Xw)(Xw)=wXXw(Xw)^\top (Xw) = w^\top X^\top X w
  • t(Xw)=tXwt^\top (Xw) = t^\top X w

Thus:

J(w)=12(wXXw2tXw+tt)J(w) = \frac{1}{2} \left( w^\top X^\top X w - 2 t^\top Xw + t^\top t \right)

Step 4: Take the Gradient wJ(w)\nabla_w J(w)

Differentiate term-by-term:

  • The derivative of wXXww^\top X^\top X w with respect to ww is 2XXw2X^\top Xw.
  • The derivative of 2tXw-2 t^\top Xw with respect to ww is 2Xt-2 X^\top t.
  • The derivative of ttt^\top t (a constant) is 00.

Because we have a 12\frac{1}{2} factor outside, it cancels out the 2’s from the derivatives.

Thus:

wJ(w)=XXwXt\nabla_w J(w) = X^\top X w - X^\top t

Step 5: Set the Gradient to Zero (Find the Minimum)

Setting the gradient to zero:

XXwXt=0X^\top X w - X^\top t = 0

Rearranging:

XXw=XtX^\top X w = X^\top t

✅ This is called the normal equation.

Step 6: Solve for ww

Multiply both sides by (XX)1(X^\top X)^{-1}:

w=(XX)1Xtw^* = (X^\top X)^{-1} X^\top t

✅ This gives the optimal weights.

Final Steps Summary (plain text)

  1. Write the cost function: J(w)=12Xwt2J(w) = \frac{1}{2} \|Xw - t\|^2
  2. Expand the square: wXXw2tXw+ttw^\top X^\top Xw - 2 t^\top Xw + t^\top t
  3. Differentiate term-by-term:
    • wXXw2XXww^\top X^\top Xw \to 2X^\top Xw
    • 2tXw2Xt-2 t^\top Xw \to -2 X^\top t
    • tt0t^\top t \to 0
  4. Set the gradient to zero: XXw=XtX^\top Xw = X^\top t
  5. Solve for ww: w=(XX)1Xtw^* = (X^\top X)^{-1} X^\top t

Important Notes

XXX^\top X must be invertible (i.e., XX has full rank).
✅ If XXX^\top X is singular (no inverse), you can fix it by adding Ridge regularization:

w=(XX+λI)1Xtw^* = (X^\top X + \lambda I)^{-1} X^\top t

✅ Linear regression has this direct solution.
Other models like logistic regression or neural networks need iterative optimization methods.

Final Formula

The direct closed-form solution for linear regression is:

w=(XX)1Xtw^* = (X^\top X)^{-1} X^\top t