An open API service indexing awesome lists of open source software.

https://github.com/linggarm/pima-indians-diabetes-classification

Pima Indians Diabetes Classification using various supervised algorithms with feature importance
https://github.com/linggarm/pima-indians-diabetes-classification

artificial-neural-networks decision-tree diabetes diabetes-prediction ensemble feature-importance knn logistic-regression machine-learning naive-bayes python random-forest supervised-learning svm

Last synced: 9 months ago
JSON representation

Pima Indians Diabetes Classification using various supervised algorithms with feature importance

Awesome Lists containing this project

README

          

# Pima-Indians-Diabetes-Classification
Pima Indians Diabetes Classification using various supervised algorithms with feature importance

## About The Project
* This project focuses on the binary classification task of predicting diabetes in Pima Indian tribe individuals based on various health-related features. We employ a range of supervised machine-learning algorithms to create predictive models.

* Algorithms used in this project:
* Support Vector Machine (SVM)
* Decision Tree
* K-Nearest Neighbor (KNN)
* Logistic Regression
* Naive Bayes
* Artificial Neural Networks (4 hidden layer, with 256 nodes for each layer, and 'ReLU' as the activation function)
* Random Forest
* Ensemble Methods

* There are 2 experiments, which differ in the preprocessing stage:
1. Scaling the data using StandardScaler (modify data distribution to match **standard normal distribution**, with mean 0, and standard deviation 1)
2. Without scaling the data

* Experiment with data scaling have better performance in most of the algorithms

## Technology Used
* Python
* Pandas
* Matplotlib
* Seaborn
* Scikit-learn
* vecstack

## Dataset
This project uses a dataset provided by the University of California, Irvine on their Machine Learning Repository. We use a specific dataset on diabetes, which is [Pima Indians Diabetes Database - UCI Machine Learning](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)

## Performance
Performance with Data Scaling | Performance without Data Scaling
:-------------------------:|:-------------------------:
![with_scaling_performance](images/with_scaling_performance.png) | ![with_scaling_performance](images/without_scaling_performance.png)

## Feature Importance
Feature Importance with Data Scaling | Feature Importance without Data Scaling
:-------------------------:|:-------------------------:
![with_scaling_feature_importance](images/with_scaling_feature_importance.png) | ![without_scaling_feature_importance](images/without_scaling_feature_importance.png)

## Contributors
* [Linggar Maretva Cendani](https://github.com/LinggarM) - [linggarmc@gmail.com](mailto: linggarmc@gmail.com)

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details

## Acknowledgments
* [Pima Indians Diabetes Database - UCI Machine Learning](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)