An open API service indexing awesome lists of open source software.

https://github.com/coding-for-it/diabetes-prediction-system

A machine learning-based system to predict diabetes using Logistic Regression, Decision Tree, and Random Forest with up to 85% accuracy. Includes EDA, model evaluation, and feature selection on a Kaggle-sourced dataset.
https://github.com/coding-for-it/diabetes-prediction-system

Last synced: about 2 months ago
JSON representation

A machine learning-based system to predict diabetes using Logistic Regression, Decision Tree, and Random Forest with up to 85% accuracy. Includes EDA, model evaluation, and feature selection on a Kaggle-sourced dataset.

Awesome Lists containing this project

README

          

๏ปฟ# Diabetes Prediction System

This project implements a machine learning-based **Diabetes Prediction System** using the PIMA Indian Diabetes dataset. It explores various classification models to predict whether a patient is likely to have diabetes based on diagnostic measurements.

## ๐Ÿ“Š Dataset

The dataset includes the following health-related features:
- Pregnancies
- Glucose
- Blood Pressure
- Skin Thickness
- Insulin
- BMI
- Diabetes Pedigree Function
- Age
- Outcome (Target: 1 = diabetic, 0 = non-diabetic)

## ๐Ÿงฐ Libraries Used

- `pandas` โ€“ for data manipulation
- `numpy` โ€“ for numerical operations
- `matplotlib`, `seaborn` โ€“ for data visualization
- `scikit-learn` โ€“ for data preprocessing, model training, and evaluation

## ๐Ÿ”น Features

โœ… **Comprehensive Exploratory Data Analysis (EDA)**
โœ… **Clean and Preprocessed Data** (handled missing values, duplicates, and scaling)
โœ… **Model Evaluation:** Logistic Regression, Decision Tree, and Random Forest
โœ… **Performance Metrics:** Accuracy, Classification Report, and Confusion Matrix
โœ… **Visualizations:** Distribution, Pairplot, Heatmap of correlations, and Model Evaluation charts

## ๐Ÿ” Project Workflow

### 1. Data Cleaning
- Zeros in certain health-related fields are replaced with median values to handle invalid entries.

### 2. Exploratory Data Analysis (EDA)
- Visualizations such as heatmaps and class distribution charts help understand relationships and feature importance.

### 3. Feature Scaling
- StandardScaler is used to normalize the feature set for improved model performance.

### 4. Model Training
Three different models are trained:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier

### 5. Model Evaluation
- Evaluation is done using:
- Accuracy Score
- Confusion Matrix
- Classification Report

### Tech Stack

- **Python 3**
- **Pandas**, **Numpy**
- **Scikit-learn**
- **Matplotlib**, **Seaborn**
- **Jupyter Notebook**

## ๐Ÿงช How to Run

1. Clone the repository:
```bash
git clone https://github.com/coding-for-it/Diabetes-Prediction-System.git
cd Diabetes-Prediction-System