Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xjqx/bc2406-project

HeartDetect 💘 - An analytical model for early intervention of Heart Disease, implemented in 2 stages
https://github.com/xjqx/bc2406-project

Last synced: 18 days ago
JSON representation

HeartDetect 💘 - An analytical model for early intervention of Heart Disease, implemented in 2 stages

Awesome Lists containing this project

README

        

# HeartDetect 💘

![LIME](https://img.shields.io/badge/-Local%20Interpretable%20Model--agnostic%20Explanations%20(LIME)-00ABB3?style=flat-square)
![SHAP](https://img.shields.io/badge/-SHapley%20Additive%20exPlanations%20(SHAP)-00ABB3?style=flat-square)

![Python](http://img.shields.io/badge/-Python-3776AB?style=flat-square&logo=python&logoColor=ffffff)
![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=flat-square&logo=numpy&logoColor=white)
![scikit-learn](https://img.shields.io/badge/scikit--learn-F06032.svg?style=flat-square&logo=scikit-learn&logoColor=white)
![Git](https://img.shields.io/badge/-Git-F05032?style=flat-square&logo=git&logoColor=white)

An Analytical Model For Early Intervention Of Heart Disease, implemented in `2 stages`

### Docs
- [Report](https://github.com/xJQx/bc2406-project/blob/main/docs/BC2406_Sem7_Team%206_Report.pdf)
- [Slide Deck](https://github.com/xJQx/bc2406-project/blob/main/docs/BC2406_Sem7_Team%206_Slides.pdf)

### Jupyter notebooks
- [data-cleaning-preprocessing.ipynb](https://github.com/xJQx/bc2406-project/blob/main/data-cleaning-preprocessing.ipynb)
- [exploratory-data-analysis_1.ipynb](https://github.com/xJQx/bc2406-project/blob/main/exploratory-data-analysis_1.ipynb)
- [exploratory-data-analysis_2.ipynb](https://github.com/xJQx/bc2406-project/blob/main/exploratory-data-analysis_2.ipynb)
- [stage1-modelling.ipynb](https://github.com/xJQx/bc2406-project/blob/main/stage1-modelling.ipynb)
- [stage2-modelling.ipynb](https://github.com/xJQx/bc2406-project/blob/main/stage2-modelling.ipynb)


# Executive Summary

This report aims to deploy data analytics to solve the business problem for National Heart Centre Singapore (NHCS). Given the increasing incidence of reported cases of cardiovascular disease (CVD) in Singapore, NHCS handles more than **120,000** outpatient consultations each year. The sudden onset of heart disease is severe and expensive to treat. Therefore, NHCS can shift the **focus to early prevention rather than treating post-diagnosis.**

To increase the involvement of individuals and primary care sectors in the prevention of heart disease, our team proposes a **2-step solution – HeartDetect.**
- The first stage is to raise individuals' awareness and manage their heart health regularly.
- The second stage is to enable the prediction of heart disease risk in the primary care sector to provide timely prevention.


# Getting Started

### 1. Clone a copy of this repository
Open your terminal and run
```
git clone https://github.com/xJQx/bc2406-project.git
```


### 2. Understanding the jupyter nodebook flow
**Data Cleaning and Pre-processing**

a) [data-cleaning-preprocessing.ipynb](https://github.com/xJQx/bc2406-project/blob/main/data-cleaning-preprocessing.ipynb)

**Stage 1:**

b) [exploratory-data-analysis_1.ipynb](https://github.com/xJQx/bc2406-project/blob/main/exploratory-data-analysis_1.ipynb)

c) [stage1-modelling.ipynb](https://github.com/xJQx/bc2406-project/blob/main/stage1-modelling.ipynb)

**Stage 2:**

d) [exploratory-data-analysis_2.ipynb](https://github.com/xJQx/bc2406-project/blob/main/exploratory-data-analysis_2.ipynb)

e) [stage2-modelling.ipynb](https://github.com/xJQx/bc2406-project/blob/main/stage2-modelling.ipynb)


### 3. Understanding the various csv files (datasets)
View the Data Dictionary [here.](https://github.com/xJQx/bc2406-project/blob/main/docs/BC2406_Sem7_Team%206_datalink_dataDictionary.rtf)

**Dataset created from the `data-cleaning-preprocessing.ipynb` notebook:**

.
├── heart_pki_2020_original.csv # original dataset
| ├── heart_pki_2020_cleaned.csv # for EDA and visualization
| └── heart_pki_2020_correlation.csv # for EDA correlation (IntegerEncoding done)
| └── heart_pki_2020_encoded.csv # for analytical models (OneHotEncoding done)
|
├── o2Saturation_original.csv # original dataset
├── heart_attack_original.csv # original dataset
│ ├── heart_attack_cleaned.csv # for EDA and analytical model (default integer encoding)
│ └── heart_attack_cleaned_text.csv # for EDA and visualization (meaningful values)
└──|


### 4. Understanding the models directory

The models directory contain all the trained models from stages 1 and 2. They can be imported and used for a dataset that fits their data dimensions.

**An example of importing and using an analytical model is as shown:**

```
# Library
import joblib

# Load the model from disk
loaded_random_forest_m3 = joblib.load('models/stage2_random_forest_m3.sav')

# Using the analytical model
result = cross_val_score(loaded_random_forest_m3, X_test, y_test, cv=5, scoring = "roc_auc").mean()
print(result)
```


# Contributors