https://github.com/distributive-network/copd-prediction
https://github.com/distributive-network/copd-prediction
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/distributive-network/copd-prediction
- Owner: Distributive-Network
- Created: 2021-10-27T18:39:08.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-27T18:46:43.000Z (over 3 years ago)
- Last Synced: 2025-01-10T19:47:59.943Z (4 months ago)
- Language: Python
- Size: 241 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# COPD Readmission Modelling
### Setup
Create a new environment and simply run `pip install -r requirements.txt` to install all needed libraries.### Training
`full_train_pipeline.py` is a self-contained training script for the XGBoost model from the SQL data. Simply run
`python full_train_pipeline.py` to train a new model.The model will be saved in a new directory `models` containing all normalizers and saved XGBoost models.
After training is complete, an out of fold (OOF) AUC score and confusion matrix will be returned. Verify these are similar to previous runs and within expectations.
Note that `full_train_pipeline.py` can use data directly from the sql server or from a csv file.
### Inference
`infer_xgb` can be used a few ways - either returning the output to a single line or returning a csv of the entire
database/input csv.`python infer_xbg.py` or `python infer_xgb.py --row all` to infer on all the rows provided through SQL/CSV, and output a CSV with patient_id and confidence
`python infer_xgb.py --row 12` or another integer in order to infer on only a given row and return/print the confidence.
Note that `infer_xgb.py` can use data directly fomr the sql server or from a csv file.
Inference can only happen following training - it loads models from `models/` so training must be run first.
### SHAP
`shap.ipynb` presents Shapley values in an easy to interpret method. There is currently the option to create a waterfall plot for one patient from inference by running `python infer_xgb.py --row 4 --shap` or another row integer.
There also is the option to use the Jupyter Notebook file to create both a summary waterfall plot for feature importance as well as individual plots for specific patients.