An open API service indexing awesome lists of open source software.

https://github.com/djdurga/predictive_analysis_in_diabetes

This project applies Logistic Regression to predict diabetes in patients using the Pima Indians Diabetes Dataset. It covers the full data science lifecycle โ€” from data imputation and exploration to model training, evaluation, and insights.
https://github.com/djdurga/predictive_analysis_in_diabetes

matplotlib numpy pandas

Last synced: about 2 months ago
JSON representation

This project applies Logistic Regression to predict diabetes in patients using the Pima Indians Diabetes Dataset. It covers the full data science lifecycle โ€” from data imputation and exploration to model training, evaluation, and insights.

Awesome Lists containing this project

README

          

# ๐Ÿฉบ Predictive Analysis in Diabetes using Logistic Regression

## ๐Ÿ“Œ Project Overview

This project applies **Logistic Regression** to predict diabetes in patients using the **Pima Indians Diabetes Dataset**. It demonstrates the full data science workflow โ€” from data imputation and EDA to model training, evaluation, and extracting insights.

---

## ๐ŸŽฏ Objective

To build a binary classification model that predicts whether a patient has diabetes (`Outcome: 1`) or not (`Outcome: 0`) using key health indicators such as glucose levels, BMI, insulin levels, and age.

---

## ๐Ÿงช Dataset Details

- **Source**: [Kaggle - Pima Indians Diabetes Dataset](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)
- **Total Records**: 768
- **Target Feature**: `Outcome` (0 = No Diabetes, 1 = Diabetes)
- **Attributes**:
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age

---

## ๐Ÿ“ˆ Workflow Summary

1. **๐Ÿ“ฅ Data Import & Exploration**
- Load dataset using pandas
- Check structure, shape, missing or zero values

2. **๐Ÿงน Data Cleaning**
- Impute zero values in `Glucose`, `BloodPressure`, `Insulin`, `BMI`, etc.

3. **๐Ÿ“Š Exploratory Data Analysis (EDA)**
- Summary statistics
- Correlation matrix
- Visualizations with Seaborn & Matplotlib

4. **โš™๏ธ Model Building**
- Logistic Regression with Scikit-learn
- Train-test split

5. **๐Ÿ“‰ Evaluation**
- Accuracy Score
- Confusion Matrix
- Precision, Recall, F1-Score
- ROC-AUC Curve

---

## ๐Ÿ›  Tools & Technologies

- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- Jupyter Notebook

---

## ๐Ÿ“‚ Repository Structure