Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Jackieam/InMeMo
[WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning
https://github.com/Jackieam/InMeMo
Last synced: 9 days ago
JSON representation
[WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning
- Host: GitHub
- URL: https://github.com/Jackieam/InMeMo
- Owner: Jackieam
- Created: 2023-08-29T07:59:20.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-03-28T02:32:24.000Z (8 months ago)
- Last Synced: 2024-08-01T13:28:22.851Z (3 months ago)
- Language: Python
- Homepage:
- Size: 49.9 MB
- Stars: 12
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[![Static Badge](https://img.shields.io/badge/WACV-2024-blue)](https://wacv2024.thecvf.com/)
[![Static Badge](https://img.shields.io/badge/InMeMo-ArXiv-b31b1b)](https://arxiv.org/abs/2311.03648)
[![Static Badge](https://img.shields.io/badge/InMeMo-PDF-pink)](https://arxiv.org/pdf/2311.03648.pdf)
[![Static Badge](https://img.shields.io/badge/PyTorch-1.12.1-orange)](https://pytorch.org/get-started/previous-versions/#linux-and-windows-10)
[![Static Badge](https://img.shields.io/badge/cudatoolkit-11.6-1f5e96)](https://developer.nvidia.com/cuda-11-6-0-download-archive)
[![Static Badge](https://img.shields.io/badge/Python-3.8-blue)](https://www.python.org/downloads/release/python-380/)# Instruct Me More! Random Prompting for Visual In-Context Learning (InMeMo)
![InMeMo](Figure/inmemo.png)
## Environment Setup
```
conda create -n inmemo python=3.8 -y
conda activate inmemo
```
The PyTorch version needs to be >= 1.8.0, and compatible with the cuda version supported by the GPU.For NVIDIA GeForce RTX 4090, here is the Installation command:
```
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt
```
## Preparation
### Dataset
Download the Pascal-5i Dataset from [Volumetric-Aggregation-Transformer](https://github.com/Seokju-Cho/Volumetric-Aggregation-Transformer), and put it under the ```InMeMo/``` path, rename to ```pascal-5i```.
### Pre-trained weights for Large-scale Vision Model
Please follow the [Visual Prompting](https://github.com/amirbar/visual_prompting) to prepare the model and download the ```CVF 1000 epochs``` pre-train checkpoint.
## Prompt Retriever
[Foreground Sementation Prompt Retriever](./Segmentation.md)[Single Object Detection Prompt Retriever](./Detection.md)
## Training
### For foreground segmentation:
```
# Change the fold for training each split.
python train_vp_segmentation.py --mode spimg_spmask --output_dir output_samples --fold 3 --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1
```
### For single object detection:
```
python train_vp_detection.py --mode spimg_spmask --output_dir output_samples --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1
```## Inference
### For foreground segmentation
#### With prompt enhancer
```
# Change the fold for testing each split.
python val_vp_segmentation.py --mode spimg_spmask --batch-size 16 --fold 3 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH
```
#### Without prompt enhancer
```
python val_vp_segmentation.py --mode no_vp --batch-size 16 --fold 3 --arr a1 --output_dir visual_examples
```
### For single object detection
#### With prompt enhancer
```
python val_vp_detection.py --mode spimg_spmask --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH
```
#### Without prompt enhancer
```
python val_vp_detection.py --mode no_vp --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples
```## Performance
![Performance](Figure/performance.png)
## Visual Examples
![Visual_result](Figure/visual_examples.png)
## Citation
If you find this work useful, please consider citing us as:
```
@inproceedings{zhang2024instruct,
title={Instruct Me More! Random Prompting for Visual In-Context Learning},
author={Zhang, Jiahao and Wang, Bowen and Li, Liangzhi and Nakashima, Yuta and Nagahara, Hajime},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={2597--2606},
year={2024}
}
```
## Acknowledgments
Part of the code is borrowed from [Visual Prompting](https://github.com/amirbar/visual_prompting), [visual_prompt_retrieval](https://github.com/ZhangYuanhan-AI/visual_prompt_retrieval), [timm](https://github.com/huggingface/pytorch-image-models), [ILM-VP](https://github.com/OPTML-Group/ILM-VP)