Aiphabet

Numerical Example

Study Hours vs. Test Scores

Let's explore how gradient descent works in our study hours vs. test scores example.

Let's use the following small dataset:

Study Hours (x) Test Score (y)
2 65
3 70
5 80
1 60
4 75

Step 1: Data Preparation and Scaling

First, let's scale our study hours:

  1. Calculate mean: μ=(2+3+5+1+4)/5=3\mu = (2 + 3 + 5 + 1 + 4) / 5 = 3
  2. Calculate standard deviation: σ1.58\sigma \approx 1.58
  3. Scale each x value: xi:=(xiμ)/σx_i := (x_i - \mu) / \sigma

Scaled study hours:

  • (2 - 3) / 1.58 ≈ -0.63
  • (3 - 3) / 1.58 ≈ 0
  • (5 - 3) / 1.58 ≈ 1.27
  • (1 - 3) / 1.58 ≈ -1.27
  • (4 - 3) / 1.58 ≈ 0.63

Our scaled dataset now looks like this:

Scaled Study Hours (x) Test Score (y)
-0.63 65
0 70
1.27 80
-1.27 60
0.63 75

Step 2: Initialize Parameters

Let's start with:

  • β0=0\beta_0 = 0
  • β1=0\beta_1 = 0
  • Learning rate α=0.01\alpha = 0.01

Step 3: Gradient Descent

We'll use these formulas to update β0\beta_0 and β1\beta_1:

β0:=β0α1ni=1n((β0+β1xi)yi)\beta_0 := \beta_0 - \alpha \frac{1}{n} \sum_{i=1}^n ((\beta_0 + \beta_1x_i) - y_i)

β1:=β1α1ni=1n((β0+β1xi)yi)xi\beta_1 := \beta_1 - \alpha \frac{1}{n} \sum_{i=1}^n ((\beta_0 + \beta_1x_i) - y_i)x_i

Let's do a few iterations:

Iteration 1:

β0=00.0115(350)=0.70\beta_0 = 0 - 0.01 \cdot \frac{1}{5} \cdot (-350) = 0.70 β1=00.0115(13.65)=0.0273\beta_1 = 0 - 0.01 \cdot \frac{1}{5} \cdot (-13.65) = 0.0273

Iteration 2:

β0=0.700.0115(279.30)=1.2586\beta_0 = 0.70 - 0.01 \cdot \frac{1}{5} \cdot (-279.30) = 1.2586 β1=0.02730.0115(10.92)=0.0491\beta_1 = 0.0273 - 0.01 \cdot \frac{1}{5} \cdot (-10.92) = 0.0491

We would continue this process until convergence.

Step 4: Convergence

For this example, let's say we've reached convergence after 1000 iterations with:

β070\beta_0 \approx 70 β17.9\beta_1 \approx 7.9


Step 5: Interpreting Results

Now we can interpret our results:

  • β070\beta_0 \approx 70: This suggests that a student who doesn't study at all (0 hours) is expected to score around 70 on the test.
  • β17.9\beta_1 \approx 7.9: For each standard deviation increase in study time (about 1.58 hours in our original scale), we expect the test score to increase by about 7.9 points.

To make predictions using our original (unscaled) hours, we would use:

Predicted Score=70+7.9(Study Hours31.58)\text{Predicted Score} = 70 + 7.9 \cdot (\frac{\text{Study Hours} - 3}{1.58})

For example, if a student studies for 6 hours:

Predicted Score=70+7.9(631.58)85\text{Predicted Score} = 70 + 7.9 \cdot (\frac{6 - 3}{1.58}) \approx 85

This example demonstrates how we can apply linear regression to real-world data, interpreting the results in a meaningful way.