https://github.com/linggarm/pima-indians-diabetes-classification
Pima Indians Diabetes Classification using various supervised algorithms with feature importance
https://github.com/linggarm/pima-indians-diabetes-classification
artificial-neural-networks decision-tree diabetes diabetes-prediction ensemble feature-importance knn logistic-regression machine-learning naive-bayes python random-forest supervised-learning svm
Last synced: 9 months ago
JSON representation
Pima Indians Diabetes Classification using various supervised algorithms with feature importance
- Host: GitHub
- URL: https://github.com/linggarm/pima-indians-diabetes-classification
- Owner: LinggarM
- License: mit
- Created: 2021-06-01T14:05:04.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-11-05T14:17:29.000Z (about 2 years ago)
- Last Synced: 2025-01-09T12:43:05.259Z (10 months ago)
- Topics: artificial-neural-networks, decision-tree, diabetes, diabetes-prediction, ensemble, feature-importance, knn, logistic-regression, machine-learning, naive-bayes, python, random-forest, supervised-learning, svm
- Language: Jupyter Notebook
- Homepage:
- Size: 852 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pima-Indians-Diabetes-Classification
Pima Indians Diabetes Classification using various supervised algorithms with feature importance
## About The Project
* This project focuses on the binary classification task of predicting diabetes in Pima Indian tribe individuals based on various health-related features. We employ a range of supervised machine-learning algorithms to create predictive models.
* Algorithms used in this project:
* Support Vector Machine (SVM)
* Decision Tree
* K-Nearest Neighbor (KNN)
* Logistic Regression
* Naive Bayes
* Artificial Neural Networks (4 hidden layer, with 256 nodes for each layer, and 'ReLU' as the activation function)
* Random Forest
* Ensemble Methods
* There are 2 experiments, which differ in the preprocessing stage:
1. Scaling the data using StandardScaler (modify data distribution to match **standard normal distribution**, with mean 0, and standard deviation 1)
2. Without scaling the data
* Experiment with data scaling have better performance in most of the algorithms
## Technology Used
* Python
* Pandas
* Matplotlib
* Seaborn
* Scikit-learn
* vecstack
## Dataset
This project uses a dataset provided by the University of California, Irvine on their Machine Learning Repository. We use a specific dataset on diabetes, which is [Pima Indians Diabetes Database - UCI Machine Learning](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)
## Performance
Performance with Data Scaling | Performance without Data Scaling
:-------------------------:|:-------------------------:
 | 
## Feature Importance
Feature Importance with Data Scaling | Feature Importance without Data Scaling
:-------------------------:|:-------------------------:
 | 
## Contributors
* [Linggar Maretva Cendani](https://github.com/LinggarM) - [linggarmc@gmail.com](mailto: linggarmc@gmail.com)
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details
## Acknowledgments
* [Pima Indians Diabetes Database - UCI Machine Learning](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)