{"id":20756185,"url":"https://github.com/mahmoodlab/pathomicfusion","last_synced_at":"2025-04-06T16:14:42.094Z","repository":{"id":36670385,"uuid":"227980102","full_name":"mahmoodlab/PathomicFusion","owner":"mahmoodlab","description":"Fusing Histology and Genomics via Deep Learning - IEEE TMI","archived":false,"fork":false,"pushed_at":"2022-08-22T03:55:00.000Z","size":44410,"stargazers_count":300,"open_issues_count":9,"forks_count":84,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-03-30T15:09:11.955Z","etag":null,"topics":["computational-pathogenomics","fusion","genomics","histopathology","mahmoodlab","multimodal","multimodal-network","pathology","pathomic","transcriptomics"],"latest_commit_sha":null,"homepage":"http://www.mahmoodlab.org","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mahmoodlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-14T06:56:30.000Z","updated_at":"2025-03-27T06:01:28.000Z","dependencies_parsed_at":"2022-08-31T07:11:13.002Z","dependency_job_id":null,"html_url":"https://github.com/mahmoodlab/PathomicFusion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahmoodlab%2FPathomicFusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahmoodlab%2FPathomicFusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahmoodlab%2FPathomicFusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mahmoodlab%2FPathomicFusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mahmoodlab","download_url":"https://codeload.github.com/mahmoodlab/PathomicFusion/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247509237,"owners_count":20950232,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-pathogenomics","fusion","genomics","histopathology","mahmoodlab","multimodal","multimodal-network","pathology","pathomic","transcriptomics"],"created_at":"2024-11-17T09:29:30.770Z","updated_at":"2025-04-06T16:14:42.071Z","avatar_url":"https://github.com/mahmoodlab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Diagnosis and Prognosis\n\n\n\u003cdetails\u003e\n\u003csummary\u003e\n  \u003cb\u003ePathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis\u003c/b\u003e, IEEE Transactions on Medical Imaging, 2020.\n  \u003ca href=\"https://ieeexplore.ieee.org/document/9186053\" target=\"blank\"\u003e[HTML]\u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/abs/1912.08937\" target=\"blank\"\u003e[arXiv]\u003c/a\u003e\n  \u003ca href=\"https://www.youtube.com/watch?v=TrjGEUVX5YE\" target=\"blank\"\u003e[Talk]\u003c/a\u003e\n  \u003cbr\u003e\u003cem\u003eRichard J Chen, Ming Y Lu, Jingwen Wang, Drew FK Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood\u003c/em\u003e\u003c/br\u003e\n\u003c/summary\u003e\n\n```bash\n@article{chen2020pathomic,\n  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},\n  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},\n  journal={IEEE Transactions on Medical Imaging},\n  year={2020},\n  publisher={IEEE}\n}\n```\n\u003c/details\u003e\n\n**Summary:** We propose a simple and scalable method for integrating histology images and -omic data using attention gating and tensor fusion. Histopathology images can be processed using CNNs or GCNs for parameter efficiency or a combination of the the two. The setup is adaptable for integrating multiple -omic modalities with histopathology and can be used for improved diagnostic, prognostic and therapeutic response determinations. \n\n\u003cimg src=\"https://github.com/mahmoodlab/PathomicFusion/blob/master/main_fig.jpg\" width=\"1024\"/\u003e\n\n## Community / Follow-Up Work :)\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003eGitHub Repositories / Projects\u003c/td\u003e\n\u003ctd\u003e\n\u003ca href=\"https://github.com/Liruiqing-ustc/HFBSurv\" target=\"_blank\"\u003e★\u003c/a\u003e\n\u003ca href=\"https://github.com/mahmoodlab/PORPOISE\" target=\"_blank\"\u003e★\u003c/a\u003e\n\u003ca href=\"https://github.com/TencentAILabHealthcare/MLA-GNN\" target=\"_blank\"\u003e★\u003c/a\u003e\n\u003ca href=\"https://github.com/zcwang0702/HGPN\" target=\"_blank\"\u003e★\u003c/a\u003e\n\u003ca href=\"https://github.com/isfj/GPDBN\" target=\"_blank\"\u003e★\u003c/a\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n  \n## Updates\n* 05/26/2021: Updated Google Drive with all models and processed data for TCGA-GBMLGG and TCGA-KIRC. found using the [following link](https://drive.google.com/drive/u/1/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf). The data made available for TCGA-GBMLGG are the **same ROIs** used by [Mobadersany et al.](https://github.com/PathologyDataScience/SCNN)\n\n## Setup\n\n### Prerequisites\n- Linux (Tested on Ubuntu 18.04)\n- NVIDIA GPU (Tested on Nvidia GeForce RTX 2080 Tis on local workstations, and Nvidia V100s using Google Cloud)\n- CUDA + cuDNN (Tested on CUDA 10.1 and cuDNN 7.5. CPU mode and CUDA without CuDNN may work with minimal modification, but untested.)\n- torch\u003e=1.1.0\n- torch_geometric=1.3.0\n\n## Code Base Structure\nThe code base structure is explained below: \n- **train_cv.py**: Cross-validation script for training unimodal and multimodal networks. This script will save evaluation metrics and predictions on the train + test split for each epoch on every split in **checkpoints**.\n- **test_cv.py**: Script for testing unimodal and unimodal networks on only the test split.\n- **train_test.py**: Contains the definitions for \"train\" and \"test\". \n- **networks.py**: Contains PyTorch model definitions for all unimodal and multimodal network.\n- **fusion.py**: Contains PyTorch model definitions for fusion.\n- **data_loaders.py**: Contains the PyTorch DatasetLoader definition for loading multimodal data.\n- **options.py**: Contains all the options for the argparser.\n- **make_splits.py**: Script for generating a pickle file that saves + aligns the path for multimodal data for cross-validation.\n- **run_cox_baselines.py**: Script for running Cox baselines.\n- **utils.py**: Contains definitions for collating, survival loss functions, data preprocessing, evaluation, figure plotting, etc...\n\nThe directory structure for your multimodal dataset should look similar to the following:\n```bash\n./\n├── data\n      └── PROJECT\n            ├── INPUT A (e.g. Image)\n                ├── image_001.png\n                ├── image_002.png\n                ├── ...\n            ├── INPUT B (e.g. Graph)\n                ├── image_001.pkl\n                ├── image_002.pkl\n                ├── ...\n            └── INPUT C (e.g. Genomic)\n                └── genomic_data.csv\n└── checkpoints\n        └── PROJECT\n            ├── TASK X (e.g. Survival Analysis)\n                ├── path\n                    ├── ...\n                ├── ...\n            └── TASK Y (e.g. Grade Classification)\n                ├── path\n                    ├── ...\n                ├── ...\n```\n\nDepending on which modalities you are interested in combining, you must: (1) write your own function for aligning multimodal data in **make_splits.py**, (2) create your DatasetLoader in **data_loaders.py**, (3) modify the **options.py** for your data and task. Models will be saved to the **checkpoints** directory, with each model for each task saved in its own directory. At the moment, the only supervised learning tasks implemented are survival outcome prediction and grade classification.\n\n## Training and Evaluation\nHere are example commands for training unimodal + multimodal networks.\n\n### Survival Model for Input A\nExample shown below for training a survival model for mode A and saving the model checkpoints + predictions at the end of each split. In this example, we would create a folder called \"CNN_A\" in \"./checkpoints/example/\" for all the models in cross-validation. It assumes that \"A\" is defined as a mode in **dataset_loaders.py** for handling modality-specific data-preprocessing steps (random crop + flip + jittering for images), and that there is a network defined for input A in **networks.py**. \"surv\" is already defined as a task for training networks for survival analysis in **options.py, networks.py, train_test.py, train_cv.py**.\n\n```\npython train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode A --model_name CNN_A --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0\n```\nTo obtain test predictions on only the test splits in your cross-validation, you can replace \"train_cv\" with \"test_cv\".\n```\npython test_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task surv --mode input_A --model input_A_CNN --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0\n```\n\n### Grade Classification Model for Input A + B\nExample shown below for training a grade classification model for fusing modes A and B. Similar to the previous example, we would create a folder called \"Fusion_AB\" in \"./checkpoints/example/\" for all the models in cross-validation. It assumes that \"AB\" is defined as a mode in **dataset_loaders.py** for handling multiple inputs A and B at the same time. \"grad\" is already defined as a task for training networks for grade classification in **options.py, networks.py, train_test.py, train_cv.py**.\n```\npython train_cv.py --exp_name surv --dataroot ./data/example/ --checkpoints_dir ./checkpoints/example/ --task grad --mode AB --model_name Fusion_AB --niter 0 --niter_decay 50 --batch_size 64 --reg_type none --init_type max --lr 0.002 --weight_decay 4e-4 --gpu_ids 0\n```\n\n## Reproducibility\nTo reporduce the results in our paper and for exact data preprocessing, implementation, and experimental details please follow the instructions here: [./data/TCGA_GBMLGG/](https://github.com/mahmoodlab/PathomicFusion/tree/master/data/TCGA_GBMLGG). Processed data and trained models can be downloaded [here](https://drive.google.com/drive/folders/1swiMrz84V3iuzk8x99vGIBd5FCVncOlf?usp=sharing).\n\n## Issues\n- Please open new threads or report issues directly (for urgent blockers) to richardchen@g.harvard.edu.\n- Immediate response to minor issues may not be available.\n\n## Licenses, Usages, and Acknowledgements\n- This project is licensed under the GNU GPLv3 License - see the [LICENSE.md](LICENSE.md) file for details. A provisional patent on this work has been filed by the Brigham and Women's Hospital.\n- This code is inspired by [SALMON](https://github.com/huangzhii/SALMON) and [SCNN](https://github.com/CancerDataScience/SCNN). Code base structure was inspired by [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).\n- Subsidized computing resources for this project were provided by Nvidia and Google Cloud. \n- If you find our work useful in your research, please consider citing our paper at:\n\n```bash\n@article{chen2020pathomic,\n  title={Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis},\n  author={Chen, Richard J and Lu, Ming Y and Wang, Jingwen and Williamson, Drew FK and Rodig, Scott J and Lindeman, Neal I and Mahmood, Faisal},\n  journal={IEEE Transactions on Medical Imaging},\n  year={2020},\n  publisher={IEEE}\n}\n```\n\n© [Mahmood Lab](http://www.mahmoodlab.org) - This code is made available under the GPLv3 License and is available for non-commercial academic purposes. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahmoodlab%2Fpathomicfusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmahmoodlab%2Fpathomicfusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahmoodlab%2Fpathomicfusion/lists"}