Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xiaoman-zhang/PMC-VQA

PMC-VQA is a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.
https://github.com/xiaoman-zhang/PMC-VQA

Last synced: about 1 month ago
JSON representation

PMC-VQA is a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.

Awesome Lists containing this project

README

        

# PMC-VQA
The official codes for [**PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering**](https://arxiv.org/pdf/2305.10415.pdf)


[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pmc-vqa-visual-instruction-tuning-for-medical/medical-visual-question-answering-on-pmc-vqa)](https://paperswithcode.com/sota/medical-visual-question-answering-on-pmc-vqa?p=pmc-vqa-visual-instruction-tuning-for-medical)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pmc-vqa-visual-instruction-tuning-for-medical/medical-visual-question-answering-on-vqa-rad)](https://paperswithcode.com/sota/medical-visual-question-answering-on-vqa-rad?p=pmc-vqa-visual-instruction-tuning-for-medical)

We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model, and establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.

The dataset is available at [Huggingface](https://huggingface.co/datasets/xmcmic/PMC-VQA/)

The model checkpoints are available at [MedVInT-TE](https://huggingface.co/xmcmic/MedVInT-TE/) and [MedVInT-TD](https://huggingface.co/xmcmic/MedVInT-TD/).
**The previous checkpoint of MedVInT-TD was mistakenly uploaded.
We have rectified the issue and updated the model's checkpoint on July 31.
Now, you can access the correct and improved version of the model.**

- [PMC-VQA](#pmc-vqa)
- [Usage](#usage)
- [1. Create Environment](#1-create-environment)
- [2. Prepare Dataset](#2-prepare-dataset)
- [3. Model Checkpoints](#3-checkpoints)
- [Acknowledgement](#acknowledgement)
- [Contribution](#contribution)
- [Cite](#cite)

## Usage

### 1. Create Environment

Please refer to https://github.com/chaoyi-wu/PMC-LLaMA

### 2. Prepare Dataset

Download from [Huggingface](https://huggingface.co/datasets/xmcmic/PMC-VQA/) and save into ./PMC-VQA

### 3. Model Checkpoints

Download the pre-trained [MedVInT-TE](https://huggingface.co/xmcmic/MedVInT-TE/), and save into `./src/MedVInT_TE/Results `directly.

Download the pre-trained [MedVInT-TD](https://huggingface.co/xmcmic/MedVInT-TD/), and save into `./src/MedVInT_TD/Results `directly.

See [MedVInT_TE](./src/MedVInT_TE/README.md) and [MedVInT_TD](./src/MedVInT_TD/README.md) for the details of training **MedVInT_TE** and **MedVInT_TD**.

## Acknowledgement

CLIP -- https://github.com/openai/CLIP

PMC-CLIP -- https://github.com/WeixiongLin/PMC-CLIP

PMC-LLaMA -- [https://github.com/zphang/minimal-llama](https://github.com/chaoyi-wu/PMC-LLaMA)

LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

We thank the authors for their open-sourced code and encourage users to cite their works when applicable.

## Contribution

Please raise an issue if you need help, any contributions are welcomed.

## Citation

If you use this code or use our pre-trained weights for your research, please cite our [paper](https://arxiv.org/abs/2305.10415)

```
@article{zhang2023pmcvqa,
title={PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering},
author={Xiaoman Zhang and Chaoyi Wu and Ziheng Zhao and Weixiong Lin and Ya Zhang and Yanfeng Wang and Weidi Xie},
year={2023},
journal={arXiv preprint arXiv:2305.10415},
}
```