Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zycheiheihei/Transferable-Visual-Prompting
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompting for Multimodal Large Language Models" has been accepted in CVPR2024.
https://github.com/zycheiheihei/Transferable-Visual-Prompting
Last synced: 12 days ago
JSON representation
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompting for Multimodal Large Language Models" has been accepted in CVPR2024.
- Host: GitHub
- URL: https://github.com/zycheiheihei/Transferable-Visual-Prompting
- Owner: zycheiheihei
- License: mit
- Created: 2023-11-24T03:20:10.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-20T03:51:22.000Z (23 days ago)
- Last Synced: 2024-12-20T04:33:59.587Z (23 days ago)
- Language: Python
- Homepage:
- Size: 843 KB
- Stars: 33
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-LVLM-Attack - Github
README
# Transferable Visual Prompting for Multimodal Large Language Models
### Installation
1. Create the virtual environment for the project.
```
cd Transferable_VP_MLLM
conda create -n transvp python=3.11
pip install -r requirements.txt
```2. Prepare the model weights
Put the model weights under `./model_weights`
* MiniGPT-4: Follow [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4) and prepare the `MiniGPT-4-Vicuna-V0-7B`
* InstructBLIP: Follow [LAVIS](https://github.com/salesforce/LAVIS) and prepare the `InstructBLIP-Vicuna-7b-v1.1`
* BLIP2: Follow [LAVIS](https://github.com/salesforce/LAVIS) and prepare the `BLIP2-FlanT5-xl`
* VPGTrans: Follow MiniGPT-4 and prepare `Vicuna-v0-7B` as LLM
* BLIVA: Follow [BLIVA](https://github.com/mlpc-ucsd/BLIVA#prepare-weight) and prepare `BLIVA-Vicuna-7B`
* VisualGLM-6B: No special operation needed.### To Reproduce Reproduced Results
1. On CIFAR10
```
python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --target_models instructblip blip2 --learning_rate 10 --fca 0.005 --tse 0.001 --epochs 1
```2. Inference with a model
Specify the path to checkpoint if you want to evaluate on the dataset with trained prompt. A reproducible checkpoint is placed in `save/checkpoint_best.pth`.
```
python transfer_cls.py --dataset cifar10 --model_name minigpt-4 --evaluate --checkpoint $PATH_TO_PROMPT
```### Bibtex
If you find this work helpful, please cite it with the bibtex below.
```
@InProceedings{Zhang_2024_CVPR,
author = {Zhang, Yichi and Dong, Yinpeng and Zhang, Siyuan and Min, Tianzan and Su, Hang and Zhu, Jun},
title = {Exploring the Transferability of Visual Prompting for Multimodal Large Language Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {26562-26572}
}
```