Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/weimin17/multimodal_transformer
A Multimodal Transformer: Fusing Clinical Notes With Structured EHR Data for Interpretable In-Hospital Mortality Prediction
https://github.com/weimin17/multimodal_transformer
clinical-notes clinical-variables ehr interpretability interpretable-ai mortality mortality-prediction multimodal transformer
Last synced: 6 days ago
JSON representation
A Multimodal Transformer: Fusing Clinical Notes With Structured EHR Data for Interpretable In-Hospital Mortality Prediction
- Host: GitHub
- URL: https://github.com/weimin17/multimodal_transformer
- Owner: weimin17
- License: mit
- Created: 2022-03-09T20:49:13.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-07-26T03:25:48.000Z (over 2 years ago)
- Last Synced: 2024-11-02T06:11:41.273Z (13 days ago)
- Topics: clinical-notes, clinical-variables, ehr, interpretability, interpretable-ai, mortality, mortality-prediction, multimodal, transformer
- Language: Python
- Homepage:
- Size: 10.9 MB
- Stars: 28
- Watchers: 4
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Multimodal Transformer
The repository for the paper "A Multimodal Transformer: Fusing Clinical Notes With Structured EHR Data for Interpretable In-Hospital Mortality Prediction" submitted to AMIA'22 Annual Symposium.# Setup
The codes are tested on CUDA 11.4 with 24GB RAM GPU. For environment setup, please follow the install instruction in Section 'Clinical Data Processing'.# Clinical Data Processing
## Structured Clinical Variables Processing
Clone https://github.com/YerevaNN/mimic3-benchmarks (Harutyunyan et al.) to 'Multimodal_Transformer/mimic3-benchmarks' folder. Setup the environment, and run all data generation steps to generate training data without text features.create folder 'data-mimic3' under 'Multimodal_Transformer' folder, and all the MIMIC-III processed data will be stored in 'data-mimi3' folder.
## Unstructured Clinical Notes Processing
Clinical Notes processing is based on repository in https://github.com/kaggarwal/ClinicalNotesICU.### Requirenments
setup the environment for notes processing and model training. Install environment:~~~~
pip install -r requrements.txt
~~~~Update all paths and configuration in 'mmtransformer/config.py'.
### Notes Processing
+ Run 'mmtransformer/scripts/extract_notes.py', the folder 'data-mimic3/root/test_text_fixed/', and 'data-mimic3/root/text_fixed/' will be generated.
+ Run 'mmtransformer/scripts/extract_T0.py' file.# Train and Test
For our well-trained model, you can download from [GoogleDrive](https://drive.google.com/file/d/1Wch0pEgQ8PeWE9p77B6rdNuo9l28CZNv/view?usp=sharing). Unzip the file and put them in './Multimodal_Transformer/mmtransformer/models/Checkpoints' and './Multimodal_Transformer/mmtransformer/models/Data' accordingly. Or you can generate the files yourself.
## Test
For model with only clinical notes (mbert), run
~~~~
python mbert.py --gpu_id 1
~~~~For multimodal transformer, run
~~~~
python IHM_mmtransformer.py --mode test --model_type both --model_name BioBert --TSModel Transformer --checkpoint_path Multimodal_Transformer --MaxLen 512 --NumOfNotes 0 --TextModelCheckpoint BioClinicalBERT_FT --freeze_model 1 --number_epoch 5 --batch_size 5 --load_model 1 --gpu_id 1
~~~~## Train
For multimodal transformer training, run
~~~~
python IHM_mmtransformer.py --mode train --model_type both --model_name BioBert --TSModel Transformer --checkpoint_path Multimodal_Transformer --MaxLen 512 --NumOfNotes 0 --TextModelCheckpoint BioClinicalBERT_FT --freeze_model 1 --number_epoch 5 --batch_size 5 --load_model 0 --gpu_id 1
~~~~# Visualization
The output of all analysis are in 'Analysis' folder. For important clinical words analysis and visualization in clinical notes,1. Run 'notes_analysis.py' to get the IG value with associated words, stored in file 'Analysis/bert_analysis_pred_all2.pkl'
2. Run 'notes_analysis3.py' to get the word list with frequency, stored in 'pred_tokenlist_top10_l0_2.txt'. We further filtered the list to remove the irrelavent words and tokens, which is stored in 'filter_pred_tokenlist_top10_l0_2.txt'.
It will also generate the word cloud 'filter_pred_tokenlist_top10_l0_2.png'.
# Credits
The code is based on repository by Khadanga et al. given in https://github.com/kaggarwal/ClinicalNotesICU, and by Deznabi et al. given in https://github.com/Information-Fusion-Lab-Umass/ClinicalNotes_TimeSeries for experimental setup.The MIMIC-III clinical variables pre-processing is clone from repository by Harutyunyan et al. given in https://github.com/YerevaNN/mimic3-benchmarks