https://github.com/coding-for-it/diabetes-prediction-system
A machine learning-based system to predict diabetes using Logistic Regression, Decision Tree, and Random Forest with up to 85% accuracy. Includes EDA, model evaluation, and feature selection on a Kaggle-sourced dataset.
https://github.com/coding-for-it/diabetes-prediction-system
Last synced: about 2 months ago
JSON representation
A machine learning-based system to predict diabetes using Logistic Regression, Decision Tree, and Random Forest with up to 85% accuracy. Includes EDA, model evaluation, and feature selection on a Kaggle-sourced dataset.
- Host: GitHub
- URL: https://github.com/coding-for-it/diabetes-prediction-system
- Owner: coding-for-it
- License: mit
- Created: 2025-06-02T04:48:04.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-13T04:12:21.000Z (12 months ago)
- Last Synced: 2025-06-13T05:20:41.833Z (12 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 1.8 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
๏ปฟ# Diabetes Prediction System
This project implements a machine learning-based **Diabetes Prediction System** using the PIMA Indian Diabetes dataset. It explores various classification models to predict whether a patient is likely to have diabetes based on diagnostic measurements.
## ๐ Dataset
The dataset includes the following health-related features:
- Pregnancies
- Glucose
- Blood Pressure
- Skin Thickness
- Insulin
- BMI
- Diabetes Pedigree Function
- Age
- Outcome (Target: 1 = diabetic, 0 = non-diabetic)
## ๐งฐ Libraries Used
- `pandas` โ for data manipulation
- `numpy` โ for numerical operations
- `matplotlib`, `seaborn` โ for data visualization
- `scikit-learn` โ for data preprocessing, model training, and evaluation
## ๐น Features
โ
**Comprehensive Exploratory Data Analysis (EDA)**
โ
**Clean and Preprocessed Data** (handled missing values, duplicates, and scaling)
โ
**Model Evaluation:** Logistic Regression, Decision Tree, and Random Forest
โ
**Performance Metrics:** Accuracy, Classification Report, and Confusion Matrix
โ
**Visualizations:** Distribution, Pairplot, Heatmap of correlations, and Model Evaluation charts
## ๐ Project Workflow
### 1. Data Cleaning
- Zeros in certain health-related fields are replaced with median values to handle invalid entries.
### 2. Exploratory Data Analysis (EDA)
- Visualizations such as heatmaps and class distribution charts help understand relationships and feature importance.
### 3. Feature Scaling
- StandardScaler is used to normalize the feature set for improved model performance.
### 4. Model Training
Three different models are trained:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
### 5. Model Evaluation
- Evaluation is done using:
- Accuracy Score
- Confusion Matrix
- Classification Report
### Tech Stack
- **Python 3**
- **Pandas**, **Numpy**
- **Scikit-learn**
- **Matplotlib**, **Seaborn**
- **Jupyter Notebook**
## ๐งช How to Run
1. Clone the repository:
```bash
git clone https://github.com/coding-for-it/Diabetes-Prediction-System.git
cd Diabetes-Prediction-System