https://github.com/linggarm/pima-indians-diabetes-classification

Pima Indians Diabetes Classification using various supervised algorithms with feature importance
https://github.com/linggarm/pima-indians-diabetes-classification

artificial-neural-networks decision-tree diabetes diabetes-prediction ensemble feature-importance knn logistic-regression machine-learning naive-bayes python random-forest supervised-learning svm

Last synced: 9 months ago
JSON representation

Pima Indians Diabetes Classification using various supervised algorithms with feature importance

Host: GitHub
URL: https://github.com/linggarm/pima-indians-diabetes-classification
Owner: LinggarM
License: mit
Created: 2021-06-01T14:05:04.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-11-05T14:17:29.000Z (about 2 years ago)
Last Synced: 2025-01-09T12:43:05.259Z (10 months ago)
Topics: artificial-neural-networks, decision-tree, diabetes, diabetes-prediction, ensemble, feature-importance, knn, logistic-regression, machine-learning, naive-bayes, python, random-forest, supervised-learning, svm
Language: Jupyter Notebook
Homepage:
Size: 852 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Pima-Indians-Diabetes-Classification
Pima Indians Diabetes Classification using various supervised algorithms with feature importance

## About The Project
* This project focuses on the binary classification task of predicting diabetes in Pima Indian tribe individuals based on various health-related features. We employ a range of supervised machine-learning algorithms to create predictive models.

* Algorithms used in this project:
* Support Vector Machine (SVM)
* Decision Tree
* K-Nearest Neighbor (KNN)
* Logistic Regression
* Naive Bayes
* Artificial Neural Networks (4 hidden layer, with 256 nodes for each layer, and 'ReLU' as the activation function)
* Random Forest
* Ensemble Methods

* There are 2 experiments, which differ in the preprocessing stage:
1. Scaling the data using StandardScaler (modify data distribution to match **standard normal distribution**, with mean 0, and standard deviation 1)
2. Without scaling the data

* Experiment with data scaling have better performance in most of the algorithms

## Technology Used
* Python
* Pandas
* Matplotlib
* Seaborn
* Scikit-learn
* vecstack

## Dataset
This project uses a dataset provided by the University of California, Irvine on their Machine Learning Repository. We use a specific dataset on diabetes, which is [Pima Indians Diabetes Database - UCI Machine Learning](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)

## Performance
Performance with Data Scaling | Performance without Data Scaling
:-------------------------:|:-------------------------:
![with_scaling_performance](images/with_scaling_performance.png) | ![with_scaling_performance](images/without_scaling_performance.png)

## Feature Importance
Feature Importance with Data Scaling | Feature Importance without Data Scaling
:-------------------------:|:-------------------------:
![with_scaling_feature_importance](images/with_scaling_feature_importance.png) | ![without_scaling_feature_importance](images/without_scaling_feature_importance.png)

## Contributors
* [Linggar Maretva Cendani](https://github.com/LinggarM) - [linggarmc@gmail.com](mailto: linggarmc@gmail.com)

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details

## Acknowledgments
* [Pima Indians Diabetes Database - UCI Machine Learning](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/linggarm/pima-indians-diabetes-classification

Awesome Lists containing this project

README