{"id":31649327,"url":"https://github.com/steffin12-git/logistic-regression-daibetics","last_synced_at":"2026-04-28T12:02:29.397Z","repository":{"id":310156363,"uuid":"1038904306","full_name":"Steffin12-git/Logistic-Regression-daibetics","owner":"Steffin12-git","description":"Built an interpretable Logistic Regression model to predict diabetes from clinical features; produced reproducible EDA, model validation, and visual diagnostics.","archived":false,"fork":false,"pushed_at":"2025-08-16T04:16:26.000Z","size":107,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-07T07:44:18.762Z","etag":null,"topics":["insights","matplotlib-pyplot","model-evaluation","pandas","python","seaborn","sklearn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Steffin12-git.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-16T03:50:11.000Z","updated_at":"2025-08-16T04:18:21.000Z","dependencies_parsed_at":"2025-08-16T06:24:23.151Z","dependency_job_id":"96cee89d-6e4e-44e0-8854-4f8a81efe559","html_url":"https://github.com/Steffin12-git/Logistic-Regression-daibetics","commit_stats":null,"previous_names":["steffin12-git/logistic-regression-daibetics"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Steffin12-git/Logistic-Regression-daibetics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Steffin12-git%2FLogistic-Regression-daibetics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Steffin12-git%2FLogistic-Regression-daibetics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Steffin12-git%2FLogistic-Regression-daibetics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Steffin12-git%2FLogistic-Regression-daibetics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Steffin12-git","download_url":"https://codeload.github.com/Steffin12-git/Logistic-Regression-daibetics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Steffin12-git%2FLogistic-Regression-daibetics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32379629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T11:25:28.583Z","status":"ssl_error","status_checked_at":"2026-04-28T11:25:05.435Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["insights","matplotlib-pyplot","model-evaluation","pandas","python","seaborn","sklearn"],"created_at":"2025-10-07T07:42:15.126Z","updated_at":"2026-04-28T12:02:29.379Z","avatar_url":"https://github.com/Steffin12-git.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 🩺 Logistic Regression — Diabetes Prediction\n\n**Repository:** *Logistic Regression for Diabetics*  \n**Notebook:** `Logistic regression for diabetics.ipynb`\n\n---\n\n## 🚀 Tech Stack\n\n![Python](https://img.shields.io/badge/Python-3.9-blue?logo=python)\n![Pandas](https://img.shields.io/badge/Pandas-Data%20Analysis-yellow?logo=pandas)\n![Scikit-learn](https://img.shields.io/badge/Scikit--learn-ML-orange?logo=scikitlearn)\n![Matplotlib](https://img.shields.io/badge/Matplotlib-Visualization-green?logo=matplotlib)\n![Seaborn](https://img.shields.io/badge/Seaborn-Visualization-lightblue)\n![Jupyter](https://img.shields.io/badge/Jupyter-Notebook-orange?logo=jupyter)\n\n---\n\n## 📌 Project Summary\n\nThis project builds a clear and interpretable **Logistic Regression** model to predict the likelihood of diabetes using clinical features.  \n\nThe notebook walks through an **end-to-end ML workflow**:  \n- Data preprocessing \u0026 feature scaling  \n- Exploratory data analysis (EDA)  \n- Model training \u0026 evaluation  \n- Visual diagnostics (confusion matrix, ROC curve)  \n- Interpretation of coefficients as odds ratios  \n\n✨ The goal is to **balance predictive performance with interpretability**, a crucial requirement for healthcare decision-making.\n\n---\n\n## 📋 Dataset\n\n- **Source:** Pima Indians Diabetes Database (UCI Repository)  \n- **Target Variable:** `Outcome` → (1 = diabetic, 0 = non-diabetic)  \n- **Features Used:**  \n  `Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age`\n\n**Preprocessing applied:**  \n- Replaced biologically implausible zero values (for Glucose, BP, BMI, etc.)  \n- Applied `StandardScaler` for normalization  \n- Split dataset using `train_test_split(test_size=0.3, random_state=42)`\n\n---\n\n## 🔎 Exploratory Data Analysis (EDA)\n\nKey checks performed:  \n- Class balance between diabetic vs. non-diabetic patients  \n- Feature distributions \u0026 correlations  \n- Relationship of top predictors (Glucose, BMI, Age) with the outcome  \n\n*(EDA plots can be extended in future iterations)*\n\n---\n\n## 🧠 Model Training\n\n- **Algorithm:** Logistic Regression (`sklearn.linear_model.LogisticRegression`)  \n- **Handling imbalance:** (optionally `class_weight='balanced'`)  \n- **Outputs:** Predictions, probabilities, coefficients  \n\n---\n\n## ✅ Model Performance\n\n**Confusion Matrix**  \n![Confusion Matrix](images/confusion%20metrics.png)\n\n**ROC Curve**  \n![ROC Curve](images/Roc%20curve.png)\n\n**Evaluation Metrics (Test Set):**  \n- **Accuracy:** `0.77`  \n- **Precision:** `0.75`  \n- **Recall (Sensitivity):** `0.75`  \n- **F1-score:** `0.82`  \n\n---\n\n## 🔑 Model Interpretation\n\nCoefficients were mapped to odds ratios for clinical interpretability. Example:\n\n| Feature                  | Coefficient (β) | Odds Ratio (exp(β)) | Interpretation                           |\n|--------------------------|----------------:|--------------------:|------------------------------------------|\n| Glucose                  | `β_glucose`     | `OR_glucose`        | Higher glucose → higher odds of diabetes |\n| BMI                      | `β_bmi`         | `OR_bmi`            | Elevated BMI increases odds              |\n| Age                      | `β_age`         | `OR_age`            | Older patients have higher risk          |\n| DiabetesPedigreeFunction | `β_dpf`         | `OR_dpf`            | Strong family history raises odds        |\n\n---\n\n## 🩺 Clinical \u0026 Business Insights\n\n- **Glucose** and **BMI** are the strongest indicators of diabetes risk.  \n- **Age** and **family history (DPF)** further amplify predicted risk.  \n- The model can serve as a **screening tool** for healthcare professionals: prioritizing high-risk patients for testing.  \n- In practice, **higher recall (sensitivity)** is preferred to minimize false negatives (undiagnosed diabetics).  \n\n---\n\n## ⚠️ Limitations\n\n- Logistic Regression assumes linear log-odds → nonlinear patterns may be missed.  \n- Missing/imputed data can bias the model.  \n- Dataset is relatively small; external validation required.  \n- Clinical deployment requires regulatory approval \u0026 real-world testing.  \n\n---\n\n## 📂 Repository Structure\n\n```\n\n📁 Logistic-Regression-diabetics\n│── Logistic regression for diabetics.ipynb   # Jupyter notebook\n│── diabetes.csv                              # Dataset\n│── images/\n│    ├── confusion metrics.png\n│    ├── Roc curve.png\n│── README.md                                 # Project documentation\n\n```\n\n---\n\n## ✨ Recruiter Pitch\n\nDeveloped a **Logistic Regression model** for diabetes prediction with strong interpretability and healthcare relevance.  \n\n- Demonstrated full ML workflow (EDA → modeling → evaluation → insights).  \n- Produced actionable clinical interpretations via coefficients \u0026 odds ratios.  \n- Delivered professional, reproducible documentation \u0026 visual diagnostics.  \n- Tools: **Python, Pandas, Scikit-learn, Matplotlib, Seaborn, Jupyter**  \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteffin12-git%2Flogistic-regression-daibetics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsteffin12-git%2Flogistic-regression-daibetics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteffin12-git%2Flogistic-regression-daibetics/lists"}