https://github.com/khaledashrafh/dt-banknote-authenticator
This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.
https://github.com/khaledashrafh/dt-banknote-authenticator
banknote-authentication decision-tree decision-tree-classifier dt machine-learning matplotlib-pyplot models numpy pandas plotting sklearn
Last synced: 8 months ago
JSON representation
This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.
- Host: GitHub
- URL: https://github.com/khaledashrafh/dt-banknote-authenticator
- Owner: KhaledAshrafH
- License: mit
- Created: 2022-12-21T02:26:45.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-08-24T19:05:25.000Z (about 2 years ago)
- Last Synced: 2024-12-07T11:08:59.859Z (10 months ago)
- Topics: banknote-authentication, decision-tree, decision-tree-classifier, dt, machine-learning, matplotlib-pyplot, models, numpy, pandas, plotting, sklearn
- Language: Python
- Homepage:
- Size: 107 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Banknote Authentication Decision Tree
This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.
## Dataset
The code uses the "BankNote_Authentication.csv" dataset, which contains four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.
## Requirements
The following libraries are imported in the code:
- `sklearn.tree`: Provides the decision tree classifier.
- `pandas`: Used for data manipulation and analysis.
- `sklearn.model_selection.train_test_split`: Splits the data into training and testing sets.
- `numpy`: Handles mathematical operations and array manipulation.
- `matplotlib.pyplot`: Enables data visualization.## Functions
### `measureAccuracy(y_pred, y_test)`
Calculates the accuracy of the predicted labels (`y_pred`) compared to the actual labels (`y_test`). Returns the accuracy as a floating-point value.
### `Experiment_Utility(X, Y, splitRatio)`
Performs an experiment with a specific train-test split ratio (`splitRatio`) using the decision tree algorithm. Splits the data into training and testing sets, fits the decision tree model, and predicts the labels for the testing set. Returns the accuracy and the number of nodes in the decision tree.
### `GetStats(array)`
Calculates the mean, maximum, and minimum values of an input array. Returns the statistics as a NumPy array.
### `Experiment(X, Y, splitRatio)`
Performs multiple experiments with a fixed train-test split ratio (`splitRatio`). Reruns the experiment five times with different random splits of the data. Returns the accuracies and tree sizes for each experiment.
### `plotting(y_axis, fileName)`
Plots the y-axis values against the training set size. Saves the plot as an image file with the specified `fileName`.
### `main()`
The main function reads the dataset, separates the features (X) and the labels (Y), and initializes matrices for accuracy and tree size statistics. It then runs two sets of experiments:
### Experiment 1: Fixed train-test split ratio
- The function runs the experiment with a 75% training ratio, recording the accuracies and tree sizes for each iteration.
- The size of each iteration is displayed in the following table:
Set Size
Accuracy
25.0
0.9620991253644315
31.0
0.9630709426627794
39.0
0.956268221574344
27.0
0.967930029154519
31.0
0.9689018464528668
### Experiment 2: Range of train-test split ratios
- The function iterates over a range of training set sizes (30% to 70%) and performs the experiment five times with different random seeds.
- For each training set size, it calculates the mean, maximum, and minimum accuracy and tree size for all iterations.
- The accuracy and tree size for each iteration are displayed in the following tables:### Accuracy for each iteration ###
Iteration
Mean
Max
Min
30%
0.96774
0.97815
0.95421
40%
0.97282
0.97937
0.96723
50%
0.97376
0.98834
0.96064
60%
0.98069
0.98361
0.96903
70%
0.97961
0.99029
0.9733
### Size for each iteration ###
Iteration
Mean
Max
Min
30%
31.8
37.0
25.0
40%
37.4
41.0
35.0
50%
35.8
45.0
27.0
60%
41.0
47.0
35.0
70%
47.0
51.0
41.0
## UsageTo run the code, follow these steps:
1. Install the required libraries: `sklearn`, `pandas`, `numpy`, and `matplotlib.pyplot`.
2. Download the "BankNote_Authentication.csv" dataset and place it in the same directory as the code file.
3. Run the code. The main function will execute the experiments and generate the accuracy and tree size results.
4. The code will also generate plots showing the accuracy and tree size against the training set size.## Conclusion
In conclusion, this Python code provides a practical implementation of banknote authentication using a decision tree algorithm. It allows for experimentation with different train-test split ratios and training set sizes, providing insights into how these factors affect the accuracy and size of the decision tree model.
## Contributing
Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.
## Team
- [Khaled Ashraf Hanafy Mahmoud - 20190186](https://github.com/KhaledAshrafH).
- [Noura Ashraf Abdelnaby Mansour - 20190592](https://github.com/NouraAshraff).
- [Samaa Khalifa Elsayed Othman - 20190247](https://github.com/SamaaKhalifa).## License
This program is licensed under the [MIT License](LICENSE.md).