Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/x-izhang/libra
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
https://github.com/x-izhang/libra
ai4science chest-xrays llama3 medical-image-analysis multimodal-large-language-models radiology-report-generation vision-language-model
Last synced: 6 days ago
JSON representation
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
- Host: GitHub
- URL: https://github.com/x-izhang/libra
- Owner: X-iZhang
- License: apache-2.0
- Created: 2024-11-28T15:07:07.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-12-26T22:55:27.000Z (7 days ago)
- Last Synced: 2024-12-26T23:26:15.448Z (7 days ago)
- Topics: ai4science, chest-xrays, llama3, medical-image-analysis, multimodal-large-language-models, radiology-report-generation, vision-language-model
- Language: Python
- Homepage: https://arxiv.org/abs/2411.19378
- Size: 2.86 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis[![arXiv](https://img.shields.io/badge/Arxiv-2411.19378-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2411.19378)
[![hf_space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https://huggingface.co/X-iZhang/libra-v1.0-7b)
[![License](https://img.shields.io/badge/License-Apache%202.0-yellow.svg?)](https://github.com/X-iZhang/Libra/blob/main/LICENSE)
[![Views](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FX-iZhang%2FLibra&count_bg=%2300C0FF&title_bg=%23004080&icon=&icon_color=%23FFFFFF&title=Views)](https://hits.seeyoufarm.com)π’ More Than Radiology: Codespace Features for MLLMs Workflow Youβll Love! π
> * **LLaVA-Type & LLaMA_3 Support**: Deploy and train advanced models effortlessly.
> * **Resume Training**: Resume training from checkpoints at any stage, whether for pre-training or fine-tuning.
> * **Validation Dataset**: Track model performance in real-time on `validation datasets` during training.
> * **Custom Metrics**: Go beyond `eval_loss` with metrics like `BLEU`, `ROUGE-L`, `RadGraph-F1` or define your own criteria on valid dataset.
> * **Smart Saving**: Automatically save the best model based on validation loss or custom evaluation scores.## Contents
- [Install](#install)
- [Libra Weights](#libra-weights)
- [Quick Start](#quick-start)
- [Dataset](#dataset)
- [Train](#train)
- [Evaluation](#evaluation)## Install
We strongly recommend that you create an environment from scratch as follows:
1. Clone this repository and navigate to Libra folder
```bash
git clone https://github.com/X-iZhang/Libra.git
cd Libra
```2. Install Package
```Shell
conda create -n libra python=3.10 -y
conda activate libra
pip install --upgrade pip # enable PEP 660 support
pip install -e .
```3. Install additional packages for Training and Evaluation cases
```Shell
pip install -e ".[train,eval]"
pip install flash-attn --no-build-isolation
```Upgrade to latest code base
```Shell
git pull
pip install -e .
```## Libra Weights
| Version | Base LLM | Vision Encoder| Checkpoint |
| ------- | ------- | ------- | ------- |
| Libra v1.0 | Meditron-7B | RAD-DINO | [X-iZhang/libra-v1.0-7b](https://huggingface.co/X-iZhang/libra-v1.0-7b) |## Quick Start
### CLI Inference
We support running inference using the CLI. To use our model, run:
```Shell
python -m libra.serve.cli \
--model-path X-iZhang/libra-v1.0-7b \
--image-file "./path/to/current_image.jpg" "./path/to/previous_image.jpg"
# If there is no previous image, only one path is needed.
```### Script Inference
You can use the `libra_eval` function in `libra/eval/run_libra.py` to easily launch a model trained by yourself or us on local machine or in Google Colab, after installing this repository.```Python
from libra.eval import libra_eval# Define the model path, which can be a pre-trained model or your own fine-tuned model.
model_path = "X-iZhang/libra-v1.0-7b" # Or your own model# Define the paths to the images. The second image is optional for temporal comparisons.
image_files = [
"./path/to/current/image.jpg",
"./path/to/previous/image.jpg" # Optional: Only include if a reference image is available
]# Define the prompt to guide the model's response. Add clinical instructions if needed.
prompt = (
"Provide a detailed description of the findings in the radiology image. "
"Following clinical context: ..."
)# Specify the conversational mode, matching the PROMPT_VERSION used during training.
conv_mode = "libra_v1"# Call the libra_eval function.
libra_eval(
model_path=model_path,
image_file=image_files,
query=prompt,
temperature=0.9,
top_p=0.8,
conv_mode=conv_mode,
max_new_tokens=512
)
```Meanwhile, you can use the Beam Search method to obtain output.
```Python
libra_eval(
model_path=model_path,
image_file=image_files,
query=prompt,
num_beams=5,
length_penalty=2,
num_return_sequences=2,
conv_mode=conv_mode,
max_new_tokens=512
)
```Additionally, you can directly use LoRA weights for inference.
```Python
libra_eval(
model_path="./path/to/lora_weights", # path to LoRA weights
model_base="./path/to/base_model", # path to base Libra model
image_file=image_files,
query=prompt,
num_beams=5,
length_penalty=2,
num_return_sequences=2,
conv_mode=conv_mode,
max_new_tokens=512
)
```## Dataset
### Prepare Data
All the data we use comes from [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) and its two variants, and we strictly follow the official split for `train/valid/test` division.
- Image Data
All images used for **Libra** come from the [MIMIC-CXR-JPG](https://physionet.org/content/mimic-cxr-jpg/2.0.0/) dataset in `.jpg` format. `DICOM` format is also supported and can be found in the [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/).
After downloading the images, they will be automatically organized into the following structure in `./path/to/playground/data`:
```
./data/physionet.org/files/mimic-cxr-jpg/2.0.0
βββfiles
βββ p10
β βββ p10000032
β βββ s50414267
β βββ image1.jpg
β βββ image2.jpg
βββ p11
βββ p12
βββ ...
βββ p19
```- Annotation Data
All annotations used for **Libra** come from the [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) and its two variants. This includes Radiology Reports and other relevant Visual Question Answering.
Please download the following datasets from the official website: `mimic-cxr-reports.zip` from [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/), [MIMIC-Diff-VQA](https://physionet.org/content/medical-diff-vqa/1.0.0/), and [MIMIC-Ext-*MIMIC-CXR-VQA*](https://physionet.org/content/mimic-ext-mimic-cxr-vqa/1.0.0/).
### Preprocess Data
- Radiology Report Sections
For free-text radiology report, we extract the `Findings`, `Impression`, `Indication`, `History`, `Comparison`, and `Technique` sections using the official [mimic-cxr](https://github.com/MIT-LCP/mimic-cxr/tree/master/txt) repository.
- Visual Question Answering for Chest X-ray
In [Medical-Diff-VQA](https://physionet.org/content/medical-diff-vqa/1.0.0/), the main image is used as the current image, and the reference image is used as the prior image. In [MIMIC-Ext-MIMIC-CXR-VQA](https://physionet.org/content/mimic-ext-mimic-cxr-vqa/1.0.0/), all cases use a dummy prior image.
### Data Download
| Alignment data files | Split | Size |
| ----- | ----- | -----: |
| [libra_alignment_train.json](https://drive.google.com/file/d/1AIT1b3eRXgJFp3FJmHci3haTunK1NTMA/view?usp=drive_link)| train | 780 MiB |
| [libra_alignment_valid.json](https://drive.google.com/file/d/1nvbUoDmw7j4HgXwZWiiACIhvZ6BvR2LX/view?usp=sharing)| valid | 79 MiB || Fine-Tuning data files | Split | Size |
| ----- | ----- | ----- |
| [libra_findings_section_train.json](https://drive.google.com/file/d/1rJ3G4uiHlzK_P6ZBUbAi-cDaWV-o6fcz/view?usp=sharing)| train | 159 MiB |
| [libra_findings_section_valid.json](https://drive.google.com/file/d/1IYwQS23veOU5SXWGYiTyq9VHUwkVESfD/view?usp=sharing)| valid | 79 MiB || Evaluation data files | Split | Size |
| --- | --- | ---: |
| [libra_findings_section_eval.jsonl](https://drive.google.com/file/d/1fy_WX616L8SgyAonadJ2fUIEaX0yrGrQ/view?usp=sharing)| eval | 2 MiB |Meanwhile, here are some bonus evaluation data files.
| Evaluation data files | Split | Size |
| --- | --- | ---: |
| [libra_impressions_section_eval.jsonl](https://drive.google.com/file/d/16msRfk7XxCmq7ZPG82lKvsnnjqsRPv__/view?usp=sharing)| eval | 1 MiB |
| [libra_MIMIC-Ext-MIMIC-CXR-VQA_eval.jsonl](https://drive.google.com/file/d/1krPMwGGY6HP4sonNKlnkhLOoZrdjfVMW/view?usp=sharing)| eval | 4 MiB |
| [libra_MIMIC-Diff-VQA _eval.jsonl](https://drive.google.com/file/d/1tP_CxPMM9PiKTq1mLYRHICcyJ36Q13mC/view?usp=sharing)| eval | 20 MiB |If you are interested in training Libra to your own task/datasets, please refer to
[`Finetune_Custom_Data.md`].## Train
## Evaluation
## Project Status πThe code is currently being organised and will be available soon. **Please check back later for updates!**
We are actively preparing the repository to ensure a seamless experience for contributors and users. Stay tuned for the initial release and future enhancements.
![architecture](./assets/libra_architecture.png)
## Acknowledgements π
We sincerely thank the following projects for their contributions to **Libra**:
* [LLaVA](https://github.com/haotian-liu/LLaVA): A Large Language and Vision Assistant, laying the groundwork for multimodal understanding.
* [FastChat](https://github.com/lm-sys/FastChat): An Open Platform for Training, Serving, and Evaluating Large Language Model based Chatbots.
* [LLaMA](https://github.com/facebookresearch/llama): Open and efficient foundation language models that inspired our core language processing capabilities.
* [MEDITRON](https://github.com/epfLLM/meditron): Open and efficient medical Large language models.
* [RAD-DINO](https://huggingface.co/microsoft/rad-dino): An open and efficient biomedical image encoder, enabling robust radiological analysis.## Citation βοΈ
If you find our paper and code useful in your research and applications, please cite using this BibTeX:
```BibTeX
@misc{zhang2024libraleveragingtemporalimages,
title={Libra: Leveraging Temporal Images for Biomedical Radiology Analysis},
author={Xi Zhang and Zaiqiao Meng and Jake Lever and Edmond S. L. Ho},
year={2024},
eprint={2411.19378},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.19378},
}
```