Study Hours vs. Test Scores
Let's explore how gradient descent works in our study hours vs. test scores example.
Let's use the following small dataset:
Study Hours (x) |
Test Score (y) |
2 |
65 |
3 |
70 |
5 |
80 |
1 |
60 |
4 |
75 |
Step 1: Data Preparation and Scaling
First, let's scale our study hours:
- Calculate mean: μ=(2+3+5+1+4)/5=3
- Calculate standard deviation: σ≈1.58
- Scale each x value: xi:=(xi−μ)/σ
Scaled study hours:
- (2 - 3) / 1.58 ≈ -0.63
- (3 - 3) / 1.58 ≈ 0
- (5 - 3) / 1.58 ≈ 1.27
- (1 - 3) / 1.58 ≈ -1.27
- (4 - 3) / 1.58 ≈ 0.63
Our scaled dataset now looks like this:
Scaled Study Hours (x) |
Test Score (y) |
-0.63 |
65 |
0 |
70 |
1.27 |
80 |
-1.27 |
60 |
0.63 |
75 |
Step 2: Initialize Parameters
Let's start with:
- β0=0
- β1=0
- Learning rate α=0.01
Step 3: Gradient Descent
We'll use these formulas to update β0 and β1:
β0:=β0−αn1∑i=1n((β0+β1xi)−yi)
β1:=β1−αn1∑i=1n((β0+β1xi)−yi)xi
Let's do a few iterations:
Iteration 1:
β0=0−0.01⋅51⋅(−350)=0.70
β1=0−0.01⋅51⋅(−13.65)=0.0273
Iteration 2:
β0=0.70−0.01⋅51⋅(−279.30)=1.2586
β1=0.0273−0.01⋅51⋅(−10.92)=0.0491
We would continue this process until convergence.
Step 4: Convergence
For this example, let's say we've reached convergence after 1000 iterations with:
β0≈70
β1≈7.9
Step 5: Interpreting Results
Now we can interpret our results:
- β0≈70: This suggests that a student who doesn't study at all (0 hours) is expected to score around 70 on the test.
- β1≈7.9: For each standard deviation increase in study time (about 1.58 hours in our original scale), we expect the test score to increase by about 7.9 points.
To make predictions using our original (unscaled) hours, we would use:
Predicted Score=70+7.9⋅(1.58Study Hours−3)
For example, if a student studies for 6 hours:
Predicted Score=70+7.9⋅(1.586−3)≈85
This example demonstrates how we can apply linear regression to real-world data, interpreting the results in a meaningful way.