You have a list of songs with their tempo (in bpm) and loudness (in dB). Using only this information, you want to group these songs into 2 distinct genres. How would you use K-means clustering to achieve this?
Songs (Data Points):
Song A: (100 bpm, -10 dB)
Song B: (120 bpm, -5 dB)
Song C: (110 bpm, -8 dB)
Song D: (150 bpm, -12 dB)
Song E: (140, -3 dB)
Goal: Cluster these songs into k = 2 k = 2 k = 2 genres.
1st Iteration
Step 1: Initialization (k=2) with Randomly Generated Points
μ 1 = ( 115 , − 7 ) \mu_1 = (115, -7) μ 1 = ( 115 , − 7 )
μ 2 = ( 130 , − 9 ) \mu_2 = (130, -9) μ 2 = ( 130 , − 9 )
Step 2: Assignment
We calculate the Euclidean distance between each song and each centroid:
Song A
Distance to μ 1 = ( 100 − 115 ) 2 + ( − 10 − ( − 7 ) ) 2 ≈ 15 \mu_1 = \sqrt{(100 - 115)^2 + (-10 - (-7))^2} \approx 15 μ 1 = ( 100 − 115 ) 2 + ( − 10 − ( − 7 ) ) 2 ≈ 15
Distance to μ 2 = ( 100 − 130 ) 2 + ( − 10 − ( − 9 ) ) 2 ≈ 30 \mu_2 = \sqrt{(100 - 130)^2 + (-10 - (-9))^2} \approx 30 μ 2 = ( 100 − 130 ) 2 + ( − 10 − ( − 9 ) ) 2 ≈ 30
A is closer to μ 1 \mu_1 μ 1 .
Song B
Distance to μ 1 = ( 120 − 115 ) 2 + ( − 5 − ( − 7 ) ) 2 ≈ 5 \mu_1 = \sqrt{(120 - 115)^2 + (-5 - (-7))^2} \approx 5 μ 1 = ( 120 − 115 ) 2 + ( − 5 − ( − 7 ) ) 2 ≈ 5
Distance to μ 2 = ( 120 − 130 ) 2 + ( − 5 − ( − 9 ) ) 2 ≈ 11 \mu_2 = \sqrt{(120 - 130)^2 + (-5 - (-9))^2} \approx 11 μ 2 = ( 120 − 130 ) 2 + ( − 5 − ( − 9 ) ) 2 ≈ 11
B is closer to μ 1 \mu_1 μ 1 .
Song C
Distance to μ 1 = ( 110 − 115 ) 2 + ( − 8 − ( − 7 ) ) 2 ≈ 5 \mu_1 = \sqrt{(110 - 115)^2 + (-8 - (-7))^2} \approx 5 μ 1 = ( 110 − 115 ) 2 + ( − 8 − ( − 7 ) ) 2 ≈ 5
Distance to μ 2 = ( 110 − 130 ) 2 + ( − 8 − ( − 9 ) ) 2 ≈ 20 \mu_2 = \sqrt{(110 - 130)^2 + (-8 - (-9))^2} \approx 20 μ 2 = ( 110 − 130 ) 2 + ( − 8 − ( − 9 ) ) 2 ≈ 20
C is closer to μ 1 \mu_1 μ 1 .
Song D
Distance to μ 1 = ( 150 − 115 ) 2 + ( − 12 − ( − 7 ) ) 2 ≈ 35 \mu_1 = \sqrt{(150 - 115)^2 + (-12 - (-7))^2} \approx 35 μ 1 = ( 150 − 115 ) 2 + ( − 12 − ( − 7 ) ) 2 ≈ 35
Distance to μ 2 = ( 150 − 130 ) 2 + ( − 12 − ( − 9 ) ) 2 ≈ 20 \mu_2 = \sqrt{(150 - 130)^2 + (-12 - (-9))^2} \approx 20 μ 2 = ( 150 − 130 ) 2 + ( − 12 − ( − 9 ) ) 2 ≈ 20
D is closer to μ 2 \mu_2 μ 2 .
Song E
Distance to μ 1 = ( 140 − 115 ) 2 + ( − 3 − ( − 7 ) ) 2 ≈ 25 \mu_1 = \sqrt{(140 - 115)^2 + (-3 - (-7))^2} \approx 25 μ 1 = ( 140 − 115 ) 2 + ( − 3 − ( − 7 ) ) 2 ≈ 25
Distance to μ 2 = ( 140 − 130 ) 2 + ( − 3 − ( − 9 ) ) 2 ≈ 11 \mu_2 = \sqrt{(140 - 130)^2 + (-3 - (-9))^2} \approx 11 μ 2 = ( 140 − 130 ) 2 + ( − 3 − ( − 9 ) ) 2 ≈ 11
E is closer to μ 2 \mu_2 μ 2 .
Initial Clusters:
Genre 1 ( μ 1 ) (\mu_1) ( μ 1 ) : A, B, C
Genre 2 ( μ 2 ) (\mu_2) ( μ 2 ) : D, E
Step 3: Update
μ 1 _ n e w = ( 100 + 120 + 110 3 , − 10 + ( − 5 ) + ( − 8 ) 3 ) = ( 110 , − 7.6 ) \mu_{1\_new} = \left( \frac{100+120+110}{3}, \frac{-10 + (-5)+ (-8) }{3} \right) = (110, -7.6) μ 1_ n e w = ( 3 100 + 120 + 110 , 3 − 10 + ( − 5 ) + ( − 8 ) ) = ( 110 , − 7.6 )
μ 2 _ n e w = ( 150 + 140 2 , − 12 + ( − 3 ) 2 ) = ( 145 , − 7.5 ) \mu_{2\_new} = \left( \frac{150 + 140}{2}, \frac{-12 + (-3)}{2} \right) = (145, -7.5) μ 2_ n e w = ( 2 150 + 140 , 2 − 12 + ( − 3 ) ) = ( 145 , − 7.5 )
2nd Iteration
Step 4: Assignment
Now we use the updated centroids:
Song A
Distance to μ 1 _ n e w = ( 100 − 110 ) 2 + ( − 10 − ( − 7.6 ) ) 2 ≈ 10 \mu_{1\_new} = \sqrt{(100 - 110)^2 + (-10 - (-7.6))^2} \approx 10 μ 1_ n e w = ( 100 − 110 ) 2 + ( − 10 − ( − 7.6 ) ) 2 ≈ 10
Distance to μ 2 _ n e w = ( 100 − 145 ) 2 + ( − 10 − ( − 7.5 ) ) 2 ≈ 45 \mu_{2\_new} = \sqrt{(100 - 145)^2 + (-10 - (-7.5))^2} \approx 45 μ 2_ n e w = ( 100 − 145 ) 2 + ( − 10 − ( − 7.5 ) ) 2 ≈ 45
A is closer to μ 1 _ n e w \mu_{1\_new} μ 1_ n e w .
Song B
Distance to μ 1 _ n e w = ( 120 − 110 ) 2 + ( − 5 − ( − 7.6 ) ) 2 ≈ 10 \mu_{1\_new} = \sqrt{(120 - 110)^2 + (-5 - (-7.6))^2} \approx 10 μ 1_ n e w = ( 120 − 110 ) 2 + ( − 5 − ( − 7.6 ) ) 2 ≈ 10
Distance to μ 2 _ n e w = ( 120 − 145 ) 2 + ( − 5 − ( − 7.5 ) ) 2 ≈ 25 \mu_{2\_new} = \sqrt{(120 - 145)^2 + (-5 - (-7.5))^2} \approx 25 μ 2_ n e w = ( 120 − 145 ) 2 + ( − 5 − ( − 7.5 ) ) 2 ≈ 25
B is closer to μ 1 _ n e w \mu_{1\_new} μ 1_ n e w .
Song C
Distance to μ 1 _ n e w = ( 110 − 110 ) 2 + ( − 8 − ( − 7.6 ) ) 2 ≈ 0 \mu_{1\_new} = \sqrt{(110 - 110)^2 + (-8 - (-7.6))^2} \approx 0 μ 1_ n e w = ( 110 − 110 ) 2 + ( − 8 − ( − 7.6 ) ) 2 ≈ 0
Distance to μ 2 _ n e w = ( 110 − 145 ) 2 + ( − 8 − ( − 7.5 ) ) 2 ≈ 35 \mu_{2\_new} = \sqrt{(110 - 145)^2 + (-8 - (-7.5))^2} \approx 35 μ 2_ n e w = ( 110 − 145 ) 2 + ( − 8 − ( − 7.5 ) ) 2 ≈ 35
C is closer to μ 1 _ n e w \mu_{1\_new} μ 1_ n e w .
Song D
Distance to μ 1 _ n e w = ( 150 − 110 ) 2 + ( − 12 − ( − 7.6 ) ) 2 ≈ 40 \mu_{1\_new} = \sqrt{(150 - 110)^2 + (-12 - (-7.6))^2} \approx 40 μ 1_ n e w = ( 150 − 110 ) 2 + ( − 12 − ( − 7.6 ) ) 2 ≈ 40
Distance to μ 2 _ n e w = ( 150 − 145 ) 2 + ( − 12 − ( − 7.5 ) ) 2 ≈ 7 \mu_{2\_new} = \sqrt{(150 - 145)^2 + (-12 - (-7.5))^2} \approx 7 μ 2_ n e w = ( 150 − 145 ) 2 + ( − 12 − ( − 7.5 ) ) 2 ≈ 7
D is closer to μ 2 _ n e w \mu_{2\_new} μ 2_ n e w .
Song E
Distance to μ 1 _ n e w = ( 140 − 110 ) 2 + ( − 3 − ( − 7.6 ) ) 2 ≈ 30 \mu_{1\_new} = \sqrt{(140 - 110)^2 + (-3 - (-7.6))^2} \approx 30 μ 1_ n e w = ( 140 − 110 ) 2 + ( − 3 − ( − 7.6 ) ) 2 ≈ 30
Distance to μ 2 _ n e w = ( 140 − 145 ) 2 + ( − 3 − ( − 7.5 ) ) 2 ≈ 7 \mu_{2\_new} = \sqrt{(140 - 145)^2 + (-3 - (-7.5))^2} \approx 7 μ 2_ n e w = ( 140 − 145 ) 2 + ( − 3 − ( − 7.5 ) ) 2 ≈ 7
E is closer to μ 2 _ n e w \mu_{2\_new} μ 2_ n e w .
Clusters (no change from the previous iteration):
Genre 1 ( μ 1 _ n e w ) (\mu_{1\_new}) ( μ 1_ n e w ) : A, B, C
Genre 2 ( μ 2 _ n e w ) (\mu_{2\_new}) ( μ 2_ n e w ) : D, E
Step 5: Update
Since the cluster assignments didn't change, recalculating the centroids will result in the same values as before.
The algorithm has converged at this point, as the cluster assignments did not change in the 2nd iteration.
Final Clusters (Genres):
Genre 1: A, B, C
Genre 2: D, E