https://github.com/salihcanaydogdu/pimaindiansdiabetes_deeplearning
The Deep Learning project for Pima Indian Diabetes dataset. For more details, you can read my report and read.me
https://github.com/salihcanaydogdu/pimaindiansdiabetes_deeplearning
artifical-intelligense balance-data classic-machine-learning deep-learning ensemble-models gru lstm modern-machine-learning
Last synced: 3 months ago
JSON representation
The Deep Learning project for Pima Indian Diabetes dataset. For more details, you can read my report and read.me
- Host: GitHub
- URL: https://github.com/salihcanaydogdu/pimaindiansdiabetes_deeplearning
- Owner: SalihCanAydogdu
- Created: 2024-09-19T04:42:35.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-09-19T07:20:38.000Z (9 months ago)
- Last Synced: 2025-02-01T10:25:42.944Z (5 months ago)
- Topics: artifical-intelligense, balance-data, classic-machine-learning, deep-learning, ensemble-models, gru, lstm, modern-machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 11.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Detection of Diabetes Using Deep Learning
## Project Description
This project aims to develop a deep learning model for detecting diabetes using clinical and physical data from the Pima Indians Diabetes dataset. By leveraging various deep learning techniques, the goal is to create an effective system for early diagnosis of diabetes.## Dataset
- **Pima Indians Diabetes Dataset**: This dataset contains 768 observations of individuals, including information such as glucose levels, blood pressure, BMI, and whether the individual has diabetes (outcome).## Preprocessing
1. Dataset balancing methods applied to address class imbalance, including SMOTE, Instance Hardness Threshold, and Edited Nearest Neighbors.
2. Data cleaned and normalized for effective model training.## Machine Learning Models
- **Random Forest**
- **K-Nearest Neighbors (KNN)**
- **XGBoost**
- **LightGBM**
- **LSTM (Long Short-Term Memory)**
- **GRU (Gated Recurrent Unit)**## Ensemble Models
- **CNN + LSTM**
- **CNN + GRU**
- **CNN + LSTM + GRU**## Results
- The best performing model was **Random Forest** with the **Instance Threshold** balancing method, achieving an accuracy of **92.66%**.
- Ensemble models provided robust performance but were not significantly better than Random Forest for this dataset.## Conclusion
This project demonstrates that deep learning techniques, particularly Random Forest combined with effective dataset balancing, can achieve high accuracy in diagnosing diabetes. Future work could focus on further refining models and testing on larger datasets.## References
- [Pima Indians Diabetes Dataset](https://www.kaggle.com/uciml/pima-indians-diabetes-database)
- [SMOTE for Imbalanced Classification](https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/)
- Additional academic papers and resources on diabetes prediction and dataset balancing methods.