https://github.com/khaledashrafh/knn-classifier
This repository contains a Python implementation of a K-Nearest Neighbors (KNN) classifier from scratch. It's applied to the "BankNote_Authentication" dataset, which consists of four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.
https://github.com/khaledashrafh/knn-classifier
banknote-authentication euclidean-distance k k-nearest-neighbor-classifier k-nearest-neighbors k-nearest-neighbours knn knn-algorithm knn-classification knn-classifier machine-learning machine-learning-algorithms
Last synced: 3 months ago
JSON representation
This repository contains a Python implementation of a K-Nearest Neighbors (KNN) classifier from scratch. It's applied to the "BankNote_Authentication" dataset, which consists of four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.
- Host: GitHub
- URL: https://github.com/khaledashrafh/knn-classifier
- Owner: KhaledAshrafH
- License: mit
- Created: 2022-12-21T01:26:24.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-24T19:05:05.000Z (almost 2 years ago)
- Last Synced: 2025-02-02T07:28:16.314Z (5 months ago)
- Topics: banknote-authentication, euclidean-distance, k, k-nearest-neighbor-classifier, k-nearest-neighbors, k-nearest-neighbours, knn, knn-algorithm, knn-classification, knn-classifier, machine-learning, machine-learning-algorithms
- Language: Python
- Homepage:
- Size: 63.5 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# K-Nearest Neighbors (KNN) Classifier from Scratch
This repository contains a Python implementation of a K-Nearest Neighbors (KNN) classifier from scratch. The KNN classifier is applied to the "BankNote_Authentication" dataset, which consists of four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.
## Dataset
The dataset used for training and testing the KNN classifier is provided in the "BankNote_Authentication.csv" file. The dataset is loaded into a pandas DataFrame and then shuffled to ensure randomization during training and testing.
## KNN Classifier Implementation
The KNN classifier is implemented in the `KNN_Classifier` class. The class takes the following inputs during initialization:
- `x_train`: The training data features.
- `y_train`: The training data labels.
- `x_test`: The test data features.
- `k`: The number of nearest neighbors to consider.The KNN classifier consists of the following methods:
### 1. `euclidean_distance`
This method calculates the Euclidean distance between a training row and a test row. It takes two input vectors and computes the Euclidean distance according to the formula:
```
distance = sqrt(sum((x_train_row[i] - x_test_row[i])^2))
```### 2. `predict`
This method predicts the class for each test point based on the K-nearest neighbors in the training data. For each test point, the Euclidean distance is calculated between the test point and all training points. The K-nearest neighbors with the smallest distances are determined, and their corresponding class labels are counted. If there is a tie in the number of votes for different classes, the tie is broken in favor of the class that comes first in the training data.
### 3. `calc_accuracy`
This method calculates the accuracy of the KNN classifier by comparing the predicted labels with the true labels for the test data. The accuracy is computed as the ratio of correctly classified instances to the total number of instances in the test set.
## Normalization
Before training and testing the KNN classifier, the feature columns are normalized separately using the mean and standard deviation of the values in the training data. Each feature is transformed using the function:
```
f(v) = (v - mean) / std
```This normalization ensures that each feature contributes equally to the distance calculation.
## Training and Testing
The dataset is split into 70% for training and 30% for testing. The training and test sets are created by dividing the feature and label arrays accordingly.
The KNN classifier is then trained on the training data and tested on the test data for different values of K ranging from 1 to 9. For each value of K, the classifier's accuracy is calculated and stored in a list.
## Experiment
The KNN classifier is evaluated using different values of K ranging from 1 to 15. The accuracy of the classifier is measured for each K value, and the results are summarized in the following table:
K
Accuracy
1
1.0
2
1.0
3
1.0
4
1.0
5
1.0
6
1.0
7
1.0
8
1.0
9
1.0
10
1.0
11
1.0
12
0.9975728155339806
13
1.0
14
0.9975728155339806
15
0.9975728155339806
## Results
The results of the KNN classifier for different values of K are displayed in the console. The output includes the value of K used for the test set and summary information for each K value, including the number of correctly classified test instances, the total number of instances in the test set, and the accuracy.
An example of the output:
```
K Value: 12
Number of correctly classified instances: 444
Total number of instances: 445
Accuracy: 0.9975728155339806
```## Conclusion
This code provides implementation of a KNN classifier from scratch using Python. It demonstrates the steps involved in training and testing a KNN classifier, including data normalization, distance calculation, and prediction. By experimenting with different values of K, the code evaluates the performance of the classifier and provides accuracy metrics for each K value.
## Contributing
Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.
## Team
- [Khaled Ashraf Hanafy Mahmoud - 20190186](https://github.com/KhaledAshrafH).
- [Noura Ashraf Abdelnaby Mansour - 20190592](https://github.com/NouraAshraff).
- [Samaa Khalifa Elsayed Othman - 20190247](https://github.com/SamaaKhalifa).## License
This program is licensed under the [MIT License](LICENSE.md).