Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/huanglizi/visionunite
This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge"
https://github.com/huanglizi/visionunite
Last synced: about 1 month ago
JSON representation
This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge"
- Host: GitHub
- URL: https://github.com/huanglizi/visionunite
- Owner: HUANGLIZI
- License: mit
- Created: 2024-05-07T18:36:53.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-11-13T18:16:58.000Z (about 2 months ago)
- Last Synced: 2024-11-13T19:25:06.388Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 3.11 MB
- Stars: 25
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VisionUnite
This repository is the official implementation of the paper "VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge" [Arxiv](https://arxiv.org/abs/2408.02865). The dataset we use for fine-tuning is the [MMFundus](https://github.com/HUANGLIZI/MMFundus) dataset.![image](https://github.com/HUANGLIZI/VisionUnite/blob/main/VisionUnite_Manuscript.jpg)
**(a)** Previous vision models could only diagnose specific diseases as positive or negative, lacking the ability to provide clinical explanations or interact with patients. However, our proposed VisionUnite changes this approach. It can predict a wide range of diseases and allows real-time conversations with patients, incorporating their feedback. Additionally, VisionUnite offers clear clinical explanations in its output, making it more understandable and useful. **(b)** The label distribution of the proposed MMFundus dataset, which includes eight main categories excluding the "Others" class. **(c)** VisionUnite is built with a transformer-based vision encoder and a specialized vision adapter designed for classifying six different signs including Vascular, Macular, FBC (Fundus Boundary Color), OCD (Optical Cup Disc), FHE (Fundus Hemorrhages Examination), and Other. It includes a vision projector to align visual embeddings with text tokens. **(d)** The illustration of image-text contrastive learning (CLIP Loss). **(e)** The illustration of classification supervised learning (CLS Loss). **(f)** The illustration of text-generation supervised learning (LLM Loss).## Requirements
Python == 3.8 and install from the ```requirements.txt``` using:
```angular2html
pip install -r requirements.txt
```## Usage
### 1. Training
You can train to get your own model.
```angular2html
bash ./exps/train.sh
```### 2. Evaluation
#### 2.1 Test the Model
Prepare the test data and run the following command
```angular2html
python demo.py
```#### 2.2 Pre-trained models
The pre-train model VisioinUnite V1 can be downloaded at the [link](https://uillinoisedu-my.sharepoint.com/:u:/g/personal/zl111_illinois_edu/Edr7x0BKfQZJmv5nQA50VZEBbKvyVuiQw3MKoGx4Y93DMg?e=eVFIWn).If you use the pre-train model provided by us, please cite the VisionUnite.
To obtain further pre-trained models for the MMFundus dataset, you can contact the email address [email protected]. We just handle the **real-name email** and **your email suffix must match your affiliation**. The email should contain the following information:
```angular2html
Name/Homepage/Google Scholar: (Tell us who you are.)
Primary Affiliation: (The name of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, Ph.D., etc.)
Affiliation Email: (the password will be sent to this email, we just reply to the email which is the end of "edu".)
How to use: (Only for academic research, not for commercial use or second-development.)
```Our code is adapted from [LLaMA-Adapter](https://github.com/OpenGVLab/LLaMA-Adapter) and [InternVL](https://github.com/OpenGVLab/InternVL). Thanks to these authors for their valuable works.
## Citation
```bash
@article{li2024visionunite,
title={VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge},
author={Li, Zihan and Song, Diping and Yang, Zefeng and Wang, Deming and Li, Fei and Zhang, Xiulan and Kinahan, Paul E and Qiao, Yu},
journal={arXiv preprint arXiv:2408.02865},
year={2024}
}
```