Last Notes
k-NN models can be very powerful; however, they need to be set up with much consideration to how the user would like the classifications to be conducted. Design decisions may include:
- Different choices of k heavily affect the system. A small k value will be more sensitive to local data trends and large k values will be less perceptive to notable data perturbations. Often, there is a Goldilocks zone that most AI specialists attempt to meet when designing KNN models to account for both effects.
- The amount of data points pertaining to different labels should be near equal, otherwise it is possible that predictions could be bias toward the label that the training data majority belongs to.
- Different types of data pre-processing can be useful when designing a k-NN model. Sometimes, data is instead chosen to be scaled using a process called normalization. you can learn more about data normalization here: https://forecastegy.com/posts/is-feature-scaling-required-for-the-knn-algorithm/
- In our examples above, we use Euclidean Distance (also called L2 distance) to calculate the distance between unknown points and our training data. However, in application, there are many distance functions that one can choose from! you can learn more about them here: https://www.kdnuggets.com/2020/11/most-popular-distance-metrics-knn.html
- In the previous examples, we strictly look at cases where there are two features (measurements) per labeled item. However, k-NN can operate with less or many more features as well!
- k-NN is great for limited data; however, as the amount of data increases, predictions can become slower and less reliable. Predictions become slower due to the increased number of distance calculations required, and predictions can become less reliable when there are too many features to take into account due to sparsity. You can learn more about these limitations here: https://www.geeksforgeeks.org/k-nearest-neighbors-and-curse-of-dimensionality/
Congratulations! You've successfully designed your first AI model and successfully applied it to real examples of industrial applications!