Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mahmoodlab/TITAN

Multimodal Whole Slide Foundation Model for Pathology
https://github.com/mahmoodlab/TITAN

Last synced: 14 days ago
JSON representation

Multimodal Whole Slide Foundation Model for Pathology

Awesome Lists containing this project

README

        

# TITAN-preview

## Multimodal Whole Slide Foundation Model for Pathology

[Preprint](https://arxiv.org/abs/2411.19666) | [Download Model](https://huggingface.co/MahmoodLab/TITAN) | [Blog](https://www.linkedin.com/pulse/building-vision-language-guided-multimodal-whole-slide-tong-ding-0hawe/?trackingId=j4u4BgoBSuad2GDEICorkw%3D%3D) | [Cite](#reference)

**Abstract:** The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning (SSL). However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions.
We propose **TITAN**, a multimodal whole slide foundation model pretrained using 335,645 WSIs via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any finetuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis. We evaluate TITAN on diverse clinical tasks and find that TITAN outperforms both ROI and slide foundation models across machine learning settings such as linear probing, few-shot and zero-shot classification, rare cancer retrieval and cross-modal retrieval, and pathology report generation.

TITAN workflow

## What is TITAN?
**TITAN** (**T**ransformer-based pathology **I**mage and **T**ext **A**lignment **N**etwork) is a multimodal whole-slide foundation model pre-trained using visual self-supervised learning and vision-language alignment. It leverages 335,645 whole-slide images (WSIs) from a diverse set of internally collected neoplastic, infectious, and inflammatory cases at Mass General Brigham. Additionally, TITAN utilizes over 182,000 pathology reports and more than 423,000 synthetic captions generated by [PathChat](https://www.nature.com/articles/s41586-024-07618-3), our pathology co-pilot. TITAN's slide embeddings achieve state-of-the-art performance on diverse downstream tasks, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval, and pathology report generation.
- _**Why use TITAN?**_: Compared to other slide foundation models that rely on either one of vision-only pretraining or vision-language alignment, TITAN combined both strategies to ensure the slide representations contain rich and comprehensive morphological semantics.
TITAN also did not use large public histology slide collections such as TCGA, PAIP, CPTAC, PANDA for pretraining, which are routinely used in benchmark development in computational pathology. Therefore, we make TITAN available for the research community in building and evaluating pathology AI models with minimal risk of data contamination on public benchmarks or private histopathology slide collections.

## Updates
- **12/04/2024**: CONCHv1.5 feature extraction is integrated into [CLAM](https://github.com/mahmoodlab/CLAM).
- **12/02/2024**: TITAN preprint and model weights (TITAN-preview and CONCHv1.5) are now live. TCGA-OT splits are available in `./datasets`.

## Installation

First clone the repo and cd into the directory:

```bash
git clone https://github.com/mahmoodlab/TITAN.git
cd TITAN
```

Then create a conda env and install the dependencies:

```bash
conda create -n titan python=3.9 -y
conda activate titan
pip install --upgrade pip
pip install -e .
```

### 1. Getting access

Request access to the model weights (CONCHv1.5 and TITAN-preview for patch and slide feature extraction, respectively) from the Huggingface model page [here](https://huggingface.co/MahmoodLab/TITAN).

### 2. Downloading weights + Creating model

Following authentication (using huggingface_hub), both TITAN-preview (slide and language encoders) and CONCH v1.5 (patch encoder) can be automatically downloaded from huggingface model hub as follows. It includes the functionalities to extract slide embeddings from patch embeddings and to perform zero-shot classification. More details can be found in our demo notebooks.

```python
from huggingface_hub import login
from transformers import AutoModel

login() # login with your User Access Token, found at https://huggingface.co/settings/tokens

titan = AutoModel.from_pretrained('MahmoodLab/TITAN', trust_remote_code=True)
conch, eval_transform = titan.return_conch()
```

### 3. Running Inference

You can directly use TITAN-preview for slide-level feature extraction. TITAN builds a feature grids from CONCH v1.5 patch features using the coordinates and the distance between the patches. As patch coordinates are always saved at the slides' level 0 magnification, TITAN takes patch_size_lv0 which represents the distance between two adjacent patches at level 0 magnification. It is 1024 if slide is 40x, or 512 if slide is 20x. We have this info saved in our demo TCGA features.

**Patch feature extraction** [CLAM](https://github.com/mahmoodlab/CLAM) can also be used for patch feature extraction with CONCHv1.5. When using `extract_features_fp.py`, set `--model_name` to 'conch_v1_5'.

**Slide feature extraction** Slide-level feature extraction can be done in the following way:

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# load TCGA sample data
from huggingface_hub import hf_hub_download
demo_h5_path = hf_hub_download(
"MahmoodLab/TITAN",
filename="TCGA_demo_features/TCGA-PC-A5DK-01Z-00-DX1.C2D3BC09-411F-46CF-811B-FDBA7C2A295B.h5",
)
file = h5py.File(demo_h5_path, 'r')
features = torch.from_numpy(file['features'][:])
coords = torch.from_numpy(file['coords'][:])
patch_size_lv0 = file['coords'].attrs['patch_size_level0']

# extract slide embedding
with torch.autocast('cuda', torch.float16), torch.inference_mode():
features = features.to(device)
coords = coords.to(device)
slide_embedding = model.encode_slide_from_patch_features(features, coords, patch_size_lv0)
```

## Demo of specific use cases

We provide a set of demo notebooks to showcase the capabilities of TITAN. The notebooks include:
- **Slide embedding extraction** from patch embeddings in `notebooks/inference_demo.ipynb`.
- **Zero-shot classification** on a single slide and on the TCGA-OT dataset in `notebooks/zeroshot_demo.ipynb`.
- **Linear Probing** evaluation of the slide embeddings of the TCGA-OT dataset in `notebooks/linear_probe_demo.ipynb`.

## Comparisons & Additional Benchmarks

We provide benchmark numbers on a set of representative tasks. A comprehensive set of benchmarks are in the [paper](https://arxiv.org/abs/2411.19666). The results are with TITAN-preview model and will be updated accordingly with newer iterations of TITAN. For **morphological classification**, the results are reported using *linear probe*. For **slide retrieval**, the results are reported using *Accuracy @K* (At least one of Top-K retrieved slides shares the same diagnostic label as the query) and *MVAccuracy @K* (The majority vote of Top-K retrieved slides is the same diagnostic label as the query).

### Release of TCGA slide features

We released all TCGA TITAN-preview features, which can be loaded by

```python
import pickle
from huggingface_hub import hf_hub_download
slide_feature_path = hf_hub_download(
"MahmoodLab/TITAN",
filename="TCGA_TITAN_features.pkl",
)
with open(slide_feature_path, 'rb') as file:
data = pickle.load(file)
```

### Dataset descriptions
- **TCGA-UT-8K** is a ROI dataset (8,192 x 8,192 pixels) was curated in consultation with the original TCGA-UT authors and will be released in the coming weeks.

- **TCGA-OT** is a slide-level 46-class classification task with 46 classes, according to the OncoTere classification system such that every class is represented by at least 50 samples. It consists of 11,186 formalin-fixed paraffin-embedded (FFPE) WSIs from TCGA and is the largest pan-cancer slide-level classification task publicly available. The splits are released in `./datasets`.

### Morphological classification
| Task | | TITAN [[1]](https://arxiv.org/abs/2411.19666) | PRISM [[2]](https://arxiv.org/abs/2405.10254) | Prov-GigaPath [[3]](https://www.nature.com/articles/s41586-024-07441-w) | CHIEF [[4]](https://www.nature.com/articles/s41586-024-07894-z) |
|:---------------|:--------------|---------------------------:|-------------------------:|-----------------:|------------------:|
| *Patch encoder* | |CONCHv1.5 | Virchow | Prov-GigaPath | CTransPath |
| TCGA-UT-8K
(32 classes, Public) | Bal. acc. | **0.832** | 0.774 | 0.700 | 0.625 |
| TCGA-OT
(46 classes, Public) | Bal. acc. | **0.704** | 0.643 | 0.543 | 0.528 |
| OT-108
(108 classes, Internal) | Bal. acc. | **0.587** | 0.508 | 0.437 | 0.413 |
| EBRAINS
(30 classes, Public) | Bal. acc. | **0.735** | 0.674 | 0.680 | 0.598 |
| Renal allograft AMR
(2 classes, internal)| AUROC | **0.915** | 0.820 | 0.836 | 0.813 |

### Slide retrieval
| Task | | TITAN [[1]](https://arxiv.org/abs/2411.19666) | PRISM [[2]](https://arxiv.org/abs/2405.10254) | Prov-GigaPath [[3]](https://www.nature.com/articles/s41586-024-07441-w) | CHIEF [[4]](https://www.nature.com/articles/s41586-024-07894-z) |
|:---------------|:--------------|---------------------------:|-------------------------:|-----------------:|------------------:|
| *Patch encoder* | |CONCHv1.5 | Virchow | Prov-GigaPath | CTransPath |
| TCGA-UT-8K
(32 classes, Public) | Acc. @3
MVacc. @3 | **0.912**
**0.875** | 0.854
0.788 | 0.728
0.645 | 0.690
0.609 |
| TCGA-OT
(46 classes, Public) | Acc. @3
MVacc. @3. | **0.880**
**0.807** | 0.836
0.755 | 0.666
0.572 | 0.669
0.602 |
| OT-108
(108 classes, Internal) | Acc. @3
MVacc. @3 | **0.707**
**0.621** | 0.636
0.547 | 0.450
0.414 | 0.442
0.400 |
| EBRAINS
(30 classes, Public) | Acc. @3
MVacc. @3 | **0.865**
**0.809** | 0.811
0.751 | 0.806
0.733 | 0.713
0.631 |
| Renal allograft AMR
(2 classes, internal)| Acc. @3
MVacc. @3 | **0.919**
**0.785** | 0.887
0.666 | 0.857
0.630 | 0.848
0.646 |

## What next?
TITAN-preview shows just a glimpse of the envisioned final TITAN model, as the model can be easily scaled. More WSIs are being digitized and the synthetic caption generation with the multimodal copilot is basically unlimited, all of which can be incorporated into TITAN's pretraining pipeline. Stay tuned for more updates!

## License and Terms of use
ⓒ Mahmood Lab. This model and associated code are released under the [CC-BY-NC-ND 4.0]((https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en)) license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the TITAN model and its derivatives, which include models trained on outputs from the TITAN model or datasets created from the TITAN model, is prohibited and requires prior approval. Downloading the model requires prior registration on Hugging Face and agreeing to the terms of use. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If another user within your organization wishes to use the TITAN model, they must register as an individual user and agree to comply with the terms of use. Users may not attempt to re-identify the deidentified data used to develop the underlying model. If you are a commercial entity, please contact the corresponding author or Mass General Brigham Innovation Office.

## Acknowledgements
The project was built on top of amazing repositories such as [ViT](https://github.com/google-research/big_vision), [iBOT](https://github.com/bytedance/ibot/tree/main), [OpenClip](https://github.com/mlfoundations/open_clip), [LGSSL](https://github.com/mbanani/lgssl), and [Timm](https://github.com/huggingface/pytorch-image-models/) (ViT model implementation). We thank the authors and developers for their contribution.

## Reference
If you find our work useful in your research or if you use parts of this code please consider citing our [paper](https://arxiv.org/abs/2411.19666):

Ding, T.\*, Wagner S.J.\*, Song, A.H.\*, Chen, R.J.\* et al. Multimodal Whole Slide Foundation Model for Pathology, _Arxiv_, 2024

```
@misc{ding2024titan,
title={Multimodal Whole Slide Foundation Model for Pathology},
author={Tong Ding and Sophia J. Wagner and Andrew H. Song and Richard J. Chen and Ming Y. Lu and Andrew Zhang and Anurag J. Vaidya and Guillaume Jaume and Muhammad Shaban and Ahrong Kim and Drew F. K. Williamson and Bowen Chen and Cristina Almagro-Perez and Paul Doucet and Sharifa Sahai and Chengkuan Chen and Daisuke Komura and Akihiro Kawabe and Shumpei Ishikawa and Georg Gerber and Tingying Peng and Long Phi Le and Faisal Mahmood},
year={2024},
eprint={2411.19666},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2411.19666},
}
```