https://github.com/khaledashrafh/dt-banknote-authenticator

This Python code utilizes the decision tree algorithm from the scikit-learn library to perform banknote authentication. The code aims to analyze the impact of different train-test split ratios and training set sizes on the accuracy and size of the learned decision tree.
https://github.com/khaledashrafh/dt-banknote-authenticator

banknote-authentication decision-tree decision-tree-classifier dt machine-learning matplotlib-pyplot models numpy pandas plotting sklearn

Last synced: 29 days ago
JSON representation

Host: GitHub
URL: https://github.com/khaledashrafh/dt-banknote-authenticator
Owner: KhaledAshrafH
License: mit
Created: 2022-12-21T02:26:45.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-08-24T19:05:25.000Z (about 2 years ago)
Last Synced: 2024-12-07T11:08:59.859Z (11 months ago)
Topics: banknote-authentication, decision-tree, decision-tree-classifier, dt, machine-learning, matplotlib-pyplot, models, numpy, pandas, plotting, sklearn
Language: Python
Homepage:
Size: 107 KB
Stars: 5
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# Banknote Authentication Decision Tree

## Dataset

The code uses the "BankNote_Authentication.csv" dataset, which contains four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.

## Requirements

The following libraries are imported in the code:

- `sklearn.tree`: Provides the decision tree classifier.
- `pandas`: Used for data manipulation and analysis.
- `sklearn.model_selection.train_test_split`: Splits the data into training and testing sets.
- `numpy`: Handles mathematical operations and array manipulation.
- `matplotlib.pyplot`: Enables data visualization.

## Functions

### `measureAccuracy(y_pred, y_test)`

Calculates the accuracy of the predicted labels (`y_pred`) compared to the actual labels (`y_test`). Returns the accuracy as a floating-point value.

### `Experiment_Utility(X, Y, splitRatio)`

Performs an experiment with a specific train-test split ratio (`splitRatio`) using the decision tree algorithm. Splits the data into training and testing sets, fits the decision tree model, and predicts the labels for the testing set. Returns the accuracy and the number of nodes in the decision tree.

### `GetStats(array)`

Calculates the mean, maximum, and minimum values of an input array. Returns the statistics as a NumPy array.

### `Experiment(X, Y, splitRatio)`

Performs multiple experiments with a fixed train-test split ratio (`splitRatio`). Reruns the experiment five times with different random splits of the data. Returns the accuracies and tree sizes for each experiment.

### `plotting(y_axis, fileName)`

Plots the y-axis values against the training set size. Saves the plot as an image file with the specified `fileName`.

### `main()`

The main function reads the dataset, separates the features (X) and the labels (Y), and initializes matrices for accuracy and tree size statistics. It then runs two sets of experiments:

### Experiment 1: Fixed train-test split ratio
- The function runs the experiment with a 75% training ratio, recording the accuracies and tree sizes for each iteration.
- The size of each iteration is displayed in the following table:

Set Size

Accuracy

25.0

0.9620991253644315

31.0

0.9630709426627794

39.0

0.956268221574344

27.0

0.967930029154519

31.0

0.9689018464528668

### Experiment 2: Range of train-test split ratios
- The function iterates over a range of training set sizes (30% to 70%) and performs the experiment five times with different random seeds.
- For each training set size, it calculates the mean, maximum, and minimum accuracy and tree size for all iterations.
- The accuracy and tree size for each iteration are displayed in the following tables:

### Accuracy for each iteration ###

Iteration

Mean

Max

Min

30%

0.96774

0.97815

0.95421

40%

0.97282

0.97937

0.96723

50%

0.97376

0.98834

0.96064

60%

0.98069

0.98361

0.96903

70%

0.97961

0.99029

0.9733

### Size for each iteration ###

Iteration

Mean

Max

Min

30%

31.8

37.0

25.0

40%

37.4

41.0

35.0

50%

35.8

45.0

27.0

60%

41.0

47.0

35.0

70%

47.0

51.0

41.0

## Usage

To run the code, follow these steps:

1. Install the required libraries: `sklearn`, `pandas`, `numpy`, and `matplotlib.pyplot`.
2. Download the "BankNote_Authentication.csv" dataset and place it in the same directory as the code file.
3. Run the code. The main function will execute the experiments and generate the accuracy and tree size results.
4. The code will also generate plots showing the accuracy and tree size against the training set size.

## Conclusion

In conclusion, this Python code provides a practical implementation of banknote authentication using a decision tree algorithm. It allows for experimentation with different train-test split ratios and training set sizes, providing insights into how these factors affect the accuracy and size of the decision tree model.

## Contributing

Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.

## Team

- [Khaled Ashraf Hanafy Mahmoud - 20190186](https://github.com/KhaledAshrafH).
- [Noura Ashraf Abdelnaby Mansour - 20190592](https://github.com/NouraAshraff).
- [Samaa Khalifa Elsayed Othman - 20190247](https://github.com/SamaaKhalifa).

## License

This program is licensed under the [MIT License](LICENSE.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/khaledashrafh/dt-banknote-authenticator

Awesome Lists containing this project

README