An open API service indexing awesome lists of open source software.

https://github.com/geekquad/knn-from-scratch

A basic project to implement the KNN classifier from Scratch.
https://github.com/geekquad/knn-from-scratch

knn knn-classifier python scratch-implementation sklearn

Last synced: about 2 months ago
JSON representation

A basic project to implement the KNN classifier from Scratch.

Awesome Lists containing this project

README

          

# KNN-from-Scratch
KNN which stands for K-Nearest Neighbours is a simple algorithm that is used for **classification** and **regression** problems in Machine Learning. KNN is a **non-parametric** and **lazy learning algorithm**. Non-parametric means there is no assumption for underlying data distribution. In other words, the model structure determined from the dataset. This will be very helpful in practice where most of the real world datasets do not follow mathematical theoretical assumptions. Lazy algorithm means it does not need any training data points for model generation. All training data used in the testing phase. This makes training faster and testing phase slower and costlier. Costly testing phase means time and memory.

In the worst case, KNN needs more time to scan all data points and scanning all data points will require more memory for storing training data.

### Working of KNN:
In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding factor. K is generally an odd number if the number of classes is 2. When K=1, then the algorithm is known as the nearest neighbor algorithm. This is the simplest case.


### Documentation of KNN:
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html


### Parameters:


  • n_neighbors (default value = 5)

  • n_neighbors represents the number of neighbors to use for kneighbors queries

    > The determination of the K value varies greatly depending on the case.

  • p: (default=”minkowski”)

  • This is the power parameter for the Minkowski metric.

    > When p = 1, this is equivalent to using manhattan_distance (l1)



    > When p = 2, this is equivalent to using euliddean_distance(l2)






### Evaluation of the model (without parameters tuning):

precision recall f1-score support
0 0.77 0.83 0.80 12
1 0.72 0.54 0.62 24
2 0.48 0.61 0.54 18
avg / total 0.65 0.63 0.63 54

#### Accuracy: 0.62




### Evaluation of the model (after parameters tuning):

precision recall f1-score support
0 0.77 0.83 0.80 12
1 0.71 0.62 0.67 24
2 0.50 0.56 0.53 18
avg / total 0.66 0.65 0.65 54

#### Accuracy: 0.64
As we can see, after parameters tuning, the accuracy of the model **increased.**