https://github.com/djdurga/predictive_analysis_in_diabetes
This project applies Logistic Regression to predict diabetes in patients using the Pima Indians Diabetes Dataset. It covers the full data science lifecycle โ from data imputation and exploration to model training, evaluation, and insights.
https://github.com/djdurga/predictive_analysis_in_diabetes
matplotlib numpy pandas
Last synced: about 2 months ago
JSON representation
This project applies Logistic Regression to predict diabetes in patients using the Pima Indians Diabetes Dataset. It covers the full data science lifecycle โ from data imputation and exploration to model training, evaluation, and insights.
- Host: GitHub
- URL: https://github.com/djdurga/predictive_analysis_in_diabetes
- Owner: Djdurga
- Created: 2025-06-16T16:41:57.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-16T16:49:27.000Z (about 1 year ago)
- Last Synced: 2025-06-16T17:49:27.640Z (about 1 year ago)
- Topics: matplotlib, numpy, pandas
- Language: Jupyter Notebook
- Homepage:
- Size: 243 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ฉบ Predictive Analysis in Diabetes using Logistic Regression
## ๐ Project Overview
This project applies **Logistic Regression** to predict diabetes in patients using the **Pima Indians Diabetes Dataset**. It demonstrates the full data science workflow โ from data imputation and EDA to model training, evaluation, and extracting insights.
---
## ๐ฏ Objective
To build a binary classification model that predicts whether a patient has diabetes (`Outcome: 1`) or not (`Outcome: 0`) using key health indicators such as glucose levels, BMI, insulin levels, and age.
---
## ๐งช Dataset Details
- **Source**: [Kaggle - Pima Indians Diabetes Dataset](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database)
- **Total Records**: 768
- **Target Feature**: `Outcome` (0 = No Diabetes, 1 = Diabetes)
- **Attributes**:
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age
---
## ๐ Workflow Summary
1. **๐ฅ Data Import & Exploration**
- Load dataset using pandas
- Check structure, shape, missing or zero values
2. **๐งน Data Cleaning**
- Impute zero values in `Glucose`, `BloodPressure`, `Insulin`, `BMI`, etc.
3. **๐ Exploratory Data Analysis (EDA)**
- Summary statistics
- Correlation matrix
- Visualizations with Seaborn & Matplotlib
4. **โ๏ธ Model Building**
- Logistic Regression with Scikit-learn
- Train-test split
5. **๐ Evaluation**
- Accuracy Score
- Confusion Matrix
- Precision, Recall, F1-Score
- ROC-AUC Curve
---
## ๐ Tools & Technologies
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- Jupyter Notebook
---
## ๐ Repository Structure