https://github.com/wasifsohail5/heart-disease-risk-stratification-and-patient-clustering-analysis
Welcome to a real-world data science and machine learning project focused on improving healthcare outcomes! This repository provides a full workflow for predicting heart disease risk and uncovering patient subgroups using the UCI Heart Disease Dataset.
https://github.com/wasifsohail5/heart-disease-risk-stratification-and-patient-clustering-analysis
bagging-ensemble clustering elbow-method gradient-boosting heart-disease pca supervised-learning unsupervised-learning
Last synced: 3 months ago
JSON representation
Welcome to a real-world data science and machine learning project focused on improving healthcare outcomes! This repository provides a full workflow for predicting heart disease risk and uncovering patient subgroups using the UCI Heart Disease Dataset.
- Host: GitHub
- URL: https://github.com/wasifsohail5/heart-disease-risk-stratification-and-patient-clustering-analysis
- Owner: WasifSohail5
- Created: 2025-06-30T07:55:40.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-07-01T19:16:25.000Z (4 months ago)
- Last Synced: 2025-07-01T20:25:24.564Z (4 months ago)
- Topics: bagging-ensemble, clustering, elbow-method, gradient-boosting, heart-disease, pca, supervised-learning, unsupervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 614 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🫀 Heart Disease Risk Stratification & Patient Clustering
Welcome to a real-world data science and machine learning project focused on improving healthcare outcomes! This repository provides a full workflow for predicting heart disease risk and uncovering patient subgroups using the [UCI Heart Disease Dataset](https://archive.ics.uci.edu/dataset/45/heart+disease).
---
## 🚀 Project Goals
1. **Risk Stratification:**
Build robust machine learning models to classify patients as high or low risk for heart disease.
2. **Patient Clustering:**
Discover clinically meaningful patient clusters to inform personalized treatment strategies.
---
## 📦 Dataset
- **Source:** UCI Machine Learning Repository
- **Features:** Age, sex, blood pressure, cholesterol, ECG results, and more
- **Target:** Heart disease diagnosis (converted to binary: 0 = no disease, 1 = disease)
---
## 🛠️ Workflow Overview
### 1. Data Preparation & Exploration
- Handle missing values and outliers
- Feature scaling and normalization
- Visual exploratory data analysis (EDA)
### 2. Supervised Learning (Classification)
- Algorithms: Decision Tree, Logistic Regression, SVM, Neural Network, Random Forest, Gradient Boosting
- Metrics: Accuracy, Precision, Recall, F1-Score, Cross-Validation
- Model comparison and selection
### 3. Unsupervised Learning (Clustering)
- Optimal number of clusters selected via the Elbow Method

- K-Means clustering (Optimal **k = 4**)
- Visualization with PCA
### 4. Insights & Recommendations
- Feature importance analysis
- Cluster profiling: disease prevalence, top risk factors, actionable group insights
- Clinical suggestions for resource allocation and treatment focus
---
## 📊 Key Visualizations
- Feature distributions & correlations
- Confusion matrices for each model
- Model performance comparison
- PCA cluster map
- Cluster-wise heart disease prevalence
All visualizations are automatically saved as PNG files for easy reporting and presentation.
---
## 🏁 Getting Started
### 1. Install dependencies
```bash
pip install numpy pandas matplotlib seaborn scikit-learn ucimlrepo
```
### 2. Run the main analysis
```bash
python heart_disease_full_solution.py
```
### 3. Review Results
- Performance metrics and insights will be displayed in the terminal.
- Visualizations will be saved as `.png` files.
- Use the findings to inform medical decisions or academic research.
---
## 🧩 Project Structure
```
heart_disease_full_solution.py
feature_distributions.png
feature_correlations.png
class_distribution.png
model_comparison.png
confusion_matrix_.png
elbow_method.png
cluster_visualization.png
cluster_disease_analysis.png
feature_importance.png
README.md
```
---
## 💡 Why This Project Matters
Heart disease remains a leading cause of mortality worldwide. By combining supervised and unsupervised learning, this project helps:
- Detect high-risk individuals early
- Personalize interventions and follow-ups
- Allocate healthcare resources more effectively
- Foster data-driven clinical decision-making
---
## 👨💻 Author
- **Wasif-Sohail55**
[GitHub Profile](https://github.com/Wasif-Sohail55)
---
## 📄 License
For educational and research purposes. Dataset provided by [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).
---
**Ready to make an impact?**
Clone, run, and explore — contribute your findings or fork for your next healthcare data science adventure!