Aiphabet

The Least Squares Method

When we say we want the "best" line, what do we actually mean? Think about it this way: for each student in our data, our line makes a prediction, and we can measure how far off that prediction is from the real score.

For example, if:

  • A student studied 3 hours and got 82%
  • Our line predicts 81% for 3 hours of study
  • The error (or difference) is 1%

The Squared Error

We care about all errors, whether we predicted too high or too low. That's why we square these differences. In math terms, for each student ii:

errori=(yif(xi))2\text{error}_i = (y_i - f(x_i))^2

errori=(yi(β0+β1xi))2\text{error}_i = (y_i - (\beta_0 + \beta_1x_i))^2

Where:

  • yiy_i is the actual score
  • f(xi)f(x_i) is our predicted score
  • The square 2^2 makes all errors positive

The Total Error

To find the best line, we want to minimize the average of all these squared errors. We write this as:

R=12ni=1n(yi(β0+β1xi))2R = \frac{1}{2n}\sum_{i=1}^n (y_i - (\beta_0 + \beta_1x_i))^2

Don't let this formula scare you! It just means:

  1. Take each prediction error
  2. Square it
  3. Add up all the squared errors
  4. Take the average

To find the best values for β0\beta_0 and β1\beta_1, we need to find where this error RR is smallest.

We usually use Gradient Descent to do that.