Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Jackieam/InMeMo

[WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning
https://github.com/Jackieam/InMeMo

Last synced: 4 months ago
JSON representation

[WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning

Host: GitHub
URL: https://github.com/Jackieam/InMeMo
Owner: Jackieam
Created: 2023-08-29T07:59:20.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-03-28T02:32:24.000Z (11 months ago)
Last Synced: 2024-08-01T13:28:22.851Z (7 months ago)
Language: Python
Homepage:
Size: 49.9 MB
Stars: 12
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        [![Static Badge](https://img.shields.io/badge/WACV-2024-blue)](https://wacv2024.thecvf.com/)

[![Static Badge](https://img.shields.io/badge/InMeMo-ArXiv-b31b1b)](https://arxiv.org/abs/2311.03648)

[![Static Badge](https://img.shields.io/badge/InMeMo-PDF-pink)](https://arxiv.org/pdf/2311.03648.pdf)

[![Static Badge](https://img.shields.io/badge/PyTorch-1.12.1-orange)](https://pytorch.org/get-started/previous-versions/#linux-and-windows-10)

[![Static Badge](https://img.shields.io/badge/cudatoolkit-11.6-1f5e96)](https://developer.nvidia.com/cuda-11-6-0-download-archive)

[![Static Badge](https://img.shields.io/badge/Python-3.8-blue)](https://www.python.org/downloads/release/python-380/)

# Instruct Me More! Random Prompting for Visual In-Context Learning (InMeMo)

![InMeMo](Figure/inmemo.png)

## Environment Setup

```

conda create -n inmemo python=3.8 -y

conda activate inmemo

```

The PyTorch version needs to be >= 1.8.0, and compatible with the cuda version supported by the GPU.

For NVIDIA GeForce RTX 4090, here is the Installation command:

```

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge

pip install -r requirements.txt

```

## Preparation

### Dataset

Download the Pascal-5ⁱ Dataset from [Volumetric-Aggregation-Transformer](https://github.com/Seokju-Cho/Volumetric-Aggregation-Transformer), and put it under the ```InMeMo/``` path, rename to ```pascal-5i```.

### Pre-trained weights for Large-scale Vision Model

Please follow the [Visual Prompting](https://github.com/amirbar/visual_prompting) to prepare the model and download the ```CVF 1000 epochs``` pre-train checkpoint.

## Prompt Retriever

[Foreground Sementation Prompt Retriever](./Segmentation.md)

[Single Object Detection Prompt Retriever](./Detection.md)

## Training

### For foreground segmentation:

```

# Change the fold for training each split.

python train_vp_segmentation.py --mode spimg_spmask --output_dir output_samples --fold 3 --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

```

### For single object detection:

```

python train_vp_detection.py --mode spimg_spmask --output_dir output_samples --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

```

## Inference

### For foreground segmentation

#### With prompt enhancer

```

# Change the fold for testing each split.

python val_vp_segmentation.py --mode spimg_spmask --batch-size 16 --fold 3 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

```

#### Without prompt enhancer

```

python val_vp_segmentation.py --mode no_vp --batch-size 16 --fold 3 --arr a1 --output_dir visual_examples

```

### For single object detection

#### With prompt enhancer

```

python val_vp_detection.py --mode spimg_spmask --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

```

#### Without prompt enhancer

```

python val_vp_detection.py --mode no_vp --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples

```

## Performance

![Performance](Figure/performance.png)

## Visual Examples

![Visual_result](Figure/visual_examples.png)

## Citation

If you find this work useful, please consider citing us as: 

```

@inproceedings{zhang2024instruct,

  title={Instruct Me More! Random Prompting for Visual In-Context Learning},

  author={Zhang, Jiahao and Wang, Bowen and Li, Liangzhi and Nakashima, Yuta and Nagahara, Hajime},

  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},

  pages={2597--2606},

  year={2024}

}

```

## Acknowledgments

Part of the code is borrowed from [Visual Prompting](https://github.com/amirbar/visual_prompting), [visual_prompt_retrieval](https://github.com/ZhangYuanhan-AI/visual_prompt_retrieval), [timm](https://github.com/huggingface/pytorch-image-models), [ILM-VP](https://github.com/OPTML-Group/ILM-VP)