https://github.com/arif-miad/heart-attack-risk-prediction
This dataset explores key factors influencing heart attack risk, such as age, cholesterol, blood pressure, and lifestyle habits. Using machine learning models.
https://github.com/arif-miad/heart-attack-risk-prediction
classification data data-science matplotlib ml pandas-python seaborn visualization
Last synced: 2 months ago
JSON representation
This dataset explores key factors influencing heart attack risk, such as age, cholesterol, blood pressure, and lifestyle habits. Using machine learning models.
- Host: GitHub
- URL: https://github.com/arif-miad/heart-attack-risk-prediction
- Owner: Arif-miad
- License: apache-2.0
- Created: 2025-01-09T11:11:40.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-01-09T11:15:08.000Z (10 months ago)
- Last Synced: 2025-02-26T22:14:16.887Z (8 months ago)
- Topics: classification, data, data-science, matplotlib, ml, pandas-python, seaborn, visualization
- Language: Jupyter Notebook
- Homepage:
- Size: 2.44 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---
# Heart Attack Risk Prediction



## ๐ Overview
This project focuses on analyzing and predicting heart attack risks using a dataset containing patient demographic, clinical, and lifestyle features. The workflow involves exploratory data analysis (EDA), preprocessing, model training, evaluation, and visualization of results using advanced machine learning techniques.
---
## ๐๏ธ Dataset Description
The dataset consists of the following columns:
- **Age**: Age of the patient.
- **Sex**: Gender of the patient (M/F).
- **ChestPainType**: Type of chest pain (e.g., ATA, NAP, ASY).
- **RestingBP**: Resting blood pressure (mm Hg).
- **Cholesterol**: Serum cholesterol in mg/dl.
- **FastingBS**: Fasting blood sugar (1 = true, 0 = false).
- **RestingECG**: Resting electrocardiographic results.
- **MaxHR**: Maximum heart rate achieved.
- **ExerciseAngina**: Exercise-induced angina (Y/N).
- **Oldpeak**: ST depression induced by exercise.
- **ST_Slope**: Slope of the peak exercise ST segment.
- **HeartDisease**: Target variable (1 = disease, 0 = no disease).
---
## ๐ Workflow
### 1๏ธโฃ **Exploratory Data Analysis (EDA)**
- **Univariate Analysis**: Histograms, count plots, and KDE plots for individual features.
- **Multivariate Analysis**: Pairwise scatter plots, heatmaps for correlations, and bar plots for feature relationships.
### 2๏ธโฃ **Data Preprocessing**
- Handled missing values (if any).
- Applied label encoding and one-hot encoding to categorical features.
- Scaled numerical features using MinMaxScaler.
### 3๏ธโฃ **Model Training**
Trained and evaluated 15 classification models, including:
- Logistic Regression
- Random Forest
- Gradient Boosting
- Support Vector Machines (SVM)
- XGBoost, AdaBoost, and more.
### 4๏ธโฃ **Model Evaluation**
- Accuracy, Precision, Recall, F1-Score, and ROC-AUC were calculated for each model.
- Visualized ROC Curves and confusion matrices for performance comparison.
### 5๏ธโฃ **Feature Importance Analysis**
- Identified important features using Random Forest feature importance plot.
---
## ๐งช Results and Insights
- The **Random Forest Classifier** and **XGBoost** models demonstrated the highest accuracy and ROC-AUC scores.
- Feature importance analysis highlighted key factors influencing heart attack risks, such as **MaxHR**, **Cholesterol**, and **Oldpeak**.
---
## ๐ ๏ธ Tools and Libraries
- **Python**: Core programming language.
- **Libraries**:
- Data Manipulation: `pandas`, `numpy`
- Visualization: `matplotlib`, `seaborn`
- Machine Learning: `scikit-learn`, `xgboost`
---
## ๐ Visualizations
1. **ROC Curves**: Showcasing model performance.
2. **Feature Importance Plot**: Highlighting the top predictors for heart disease.
3. **Heatmap**: Depicting correlations between features.
---
## ๐ก Key Features
- Comprehensive workflow for heart disease risk analysis.
- Comparison of multiple machine learning models.
- Clear and interpretable visualizations.
---
## ๐ฅ Usage
1. Clone the repository:
```bash
git clone https://github.com/username/heart-attack-risk-prediction.git
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the notebook to reproduce results.
---
## ๐ License
This project is licensed under the [MIT License](LICENSE).
---