https://github.com/borgwardtlab/longcovid
Prediction of long COVID from proteomic and clinical data
https://github.com/borgwardtlab/longcovid
Last synced: about 1 year ago
JSON representation
Prediction of long COVID from proteomic and clinical data
- Host: GitHub
- URL: https://github.com/borgwardtlab/longcovid
- Owner: BorgwardtLab
- License: mit
- Created: 2022-12-09T10:49:30.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-02-08T08:48:08.000Z (over 2 years ago)
- Last Synced: 2025-01-22T04:14:00.778Z (over 1 year ago)
- Language: Python
- Size: 38.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LongCOVID
A repository to share code for long COVID predictions based on a random forest classifier as well as a univariate association analysis of proteomic features to longCOVID labels.
# Requirements
python 3.7.4
scikit-learn 1.1.3
pandas 1.5.2
scipy 1.9.3
numpy 1.23.5
shap 0.41.0
statsmodels 0.13.5
openpyxl 3.0.10
# Required input data
The input data required to execute these scripts can be obtained from . Please include these in a folder ***Data***. This should comprise:
- Proteomics_Clinical_Data_220902_Acute_plus_healthy_v5.xlsx
- Proteomics_Clinical_Data_220902_6M_timepoint_v4.xlsx
- Proteomics_Clinical_Data_220902_Labels_v2.xlsx
- Table S2 Biological protein cluster compositions.xlsx
# Execution
We provide the data splits used in ***partitions***. Relevant label dictionaries need to be generated based on the label data file listed above.
Run the file ***prediction_RF.py*** to generate model predictions,
association analysis either for individual proteomic features, or clusters thereof can be obtained using ***associationAnalysis.py***, and ***associationClusters.py*** respectively.
In ***combineInterpreations.py*** we combine the SHAP analysis results of multiple cross validation folds.
# Acknowledgements
This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 813533 (K.B.).