Aiphabet

Song Clustering

You have a list of songs with their tempo (in bpm) and loudness (in dB). Using only this information, you want to group these songs into 2 distinct genres. How would you use K-means clustering to achieve this?

Songs (Data Points):

  • Song A: (100 bpm, -10 dB)
  • Song B: (120 bpm, -5 dB)
  • Song C: (110 bpm, -8 dB)
  • Song D: (150 bpm, -12 dB)
  • Song E: (140, -3 dB)

Goal: Cluster these songs into k=2k = 2 genres.

undefined

1st Iteration

Step 1: Initialization (k=2) with Randomly Generated Points

  • μ1=(115,7)\mu_1 = (115, -7)
  • μ2=(130,9)\mu_2 = (130, -9)

undefined

Step 2: Assignment

We calculate the Euclidean distance between each song and each centroid:

  • Song A

    • Distance to μ1=(100115)2+(10(7))215\mu_1 = \sqrt{(100 - 115)^2 + (-10 - (-7))^2} \approx 15
    • Distance to μ2=(100130)2+(10(9))230\mu_2 = \sqrt{(100 - 130)^2 + (-10 - (-9))^2} \approx 30
    • A is closer to μ1\mu_1.
  • Song B

    • Distance to μ1=(120115)2+(5(7))25\mu_1 = \sqrt{(120 - 115)^2 + (-5 - (-7))^2} \approx 5
    • Distance to μ2=(120130)2+(5(9))211\mu_2 = \sqrt{(120 - 130)^2 + (-5 - (-9))^2} \approx 11
    • B is closer to μ1\mu_1.
  • Song C

    • Distance to μ1=(110115)2+(8(7))25\mu_1 = \sqrt{(110 - 115)^2 + (-8 - (-7))^2} \approx 5
    • Distance to μ2=(110130)2+(8(9))220\mu_2 = \sqrt{(110 - 130)^2 + (-8 - (-9))^2} \approx 20
    • C is closer to μ1\mu_1.
  • Song D

    • Distance to μ1=(150115)2+(12(7))235\mu_1 = \sqrt{(150 - 115)^2 + (-12 - (-7))^2} \approx 35
    • Distance to μ2=(150130)2+(12(9))220\mu_2 = \sqrt{(150 - 130)^2 + (-12 - (-9))^2} \approx 20
    • D is closer to μ2\mu_2.
  • Song E

    • Distance to μ1=(140115)2+(3(7))225\mu_1 = \sqrt{(140 - 115)^2 + (-3 - (-7))^2} \approx 25
    • Distance to μ2=(140130)2+(3(9))211\mu_2 = \sqrt{(140 - 130)^2 + (-3 - (-9))^2} \approx 11
    • E is closer to μ2\mu_2.

Initial Clusters:

  • Genre 1 (μ1)(\mu_1): A, B, C
  • Genre 2 (μ2)(\mu_2): D, E

undefined

Step 3: Update

μ1_new=(100+120+1103,10+(5)+(8)3)=(110,7.6)\mu_{1\_new} = \left( \frac{100+120+110}{3}, \frac{-10 + (-5)+ (-8) }{3} \right) = (110, -7.6)

μ2_new=(150+1402,12+(3)2)=(145,7.5)\mu_{2\_new} = \left( \frac{150 + 140}{2}, \frac{-12 + (-3)}{2} \right) = (145, -7.5)

undefined

2nd Iteration

Step 4: Assignment

Now we use the updated centroids:

  • Song A

    • Distance to μ1_new=(100110)2+(10(7.6))210\mu_{1\_new} = \sqrt{(100 - 110)^2 + (-10 - (-7.6))^2} \approx 10
    • Distance to μ2_new=(100145)2+(10(7.5))245\mu_{2\_new} = \sqrt{(100 - 145)^2 + (-10 - (-7.5))^2} \approx 45
    • A is closer to μ1_new\mu_{1\_new}.
  • Song B

    • Distance to μ1_new=(120110)2+(5(7.6))210\mu_{1\_new} = \sqrt{(120 - 110)^2 + (-5 - (-7.6))^2} \approx 10
    • Distance to μ2_new=(120145)2+(5(7.5))225\mu_{2\_new} = \sqrt{(120 - 145)^2 + (-5 - (-7.5))^2} \approx 25
    • B is closer to μ1_new\mu_{1\_new}.
  • Song C

    • Distance to μ1_new=(110110)2+(8(7.6))20\mu_{1\_new} = \sqrt{(110 - 110)^2 + (-8 - (-7.6))^2} \approx 0
    • Distance to μ2_new=(110145)2+(8(7.5))235\mu_{2\_new} = \sqrt{(110 - 145)^2 + (-8 - (-7.5))^2} \approx 35
    • C is closer to μ1_new\mu_{1\_new}.
  • Song D

    • Distance to μ1_new=(150110)2+(12(7.6))240\mu_{1\_new} = \sqrt{(150 - 110)^2 + (-12 - (-7.6))^2} \approx 40
    • Distance to μ2_new=(150145)2+(12(7.5))27\mu_{2\_new} = \sqrt{(150 - 145)^2 + (-12 - (-7.5))^2} \approx 7
    • D is closer to μ2_new\mu_{2\_new}.
  • Song E

    • Distance to μ1_new=(140110)2+(3(7.6))230\mu_{1\_new} = \sqrt{(140 - 110)^2 + (-3 - (-7.6))^2} \approx 30
    • Distance to μ2_new=(140145)2+(3(7.5))27\mu_{2\_new} = \sqrt{(140 - 145)^2 + (-3 - (-7.5))^2} \approx 7
    • E is closer to μ2_new\mu_{2\_new}.

Clusters (no change from the previous iteration):

  • Genre 1 (μ1_new)(\mu_{1\_new}): A, B, C
  • Genre 2 (μ2_new)(\mu_{2\_new}): D, E

Step 5: Update

Since the cluster assignments didn't change, recalculating the centroids will result in the same values as before.

The algorithm has converged at this point, as the cluster assignments did not change in the 2nd iteration.

Final Clusters (Genres):

  • Genre 1: A, B, C
  • Genre 2: D, E