https://github.com/amazon-science/prompt-pretraining
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
https://github.com/amazon-science/prompt-pretraining
Last synced: 11 months ago
JSON representation
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
- Host: GitHub
- URL: https://github.com/amazon-science/prompt-pretraining
- Owner: amazon-science
- License: apache-2.0
- Created: 2023-03-22T03:31:46.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-03T20:49:08.000Z (about 2 years ago)
- Last Synced: 2025-06-30T19:50:37.970Z (11 months ago)
- Language: Python
- Homepage:
- Size: 15.6 MB
- Stars: 258
- Watchers: 5
- Forks: 9
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
"Scaling up prompt learning on ImageNet-21K achieves SOTA on 21 downstream datasets."
> [**Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition**](https://arxiv.org/abs/2304.04704)
> [Shuhuai Ren](https://renshuhuai-andy.github.io/), [Aston Zhang](https://www.astonzhang.com/), [Yi Zhu](https://bryanyzhu.github.io/), [Shuai Zhang](https://shuaizhang.tech/), [Shuai Zheng](https://szhengac.github.io/), [Mu Li](http://www.cs.cmu.edu/~muli/), [Alex Smola](https://alex.smola.org/), [Xu Sun](https://xusun.org/index.htm)
[](https://arxiv.org/abs/2304.04704)
[](
https://colab.research.google.com/drive/1OEFw1GfKXogx8mdFS2pClLjPK3aPEZyY?usp=sharing)
[](https://paperswithcode.com/sota/prompt-engineering-on-imagenet-21k?p=prompt-pre-training-with-twenty-thousand)
[](https://paperswithcode.com/sota/prompt-engineering-on-imagenet-a?p=prompt-pre-training-with-twenty-thousand)
[](https://paperswithcode.com/sota/prompt-engineering-on-imagenet-r?p=prompt-pre-training-with-twenty-thousand)
[](https://paperswithcode.com/sota/prompt-engineering-on-imagenet-s?p=prompt-pre-training-with-twenty-thousand)
[](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-coco?p=prompt-pre-training-with-twenty-thousand)
[](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-5?p=prompt-pre-training-with-twenty-thousand) [](https://paperswithcode.com/sota/open-vocabulary-object-detection-on-lvis-v1-0?p=prompt-pre-training-with-twenty-thousand)
# :rocket: News
* **(Jul 11, 2023)**
* Inference demo for object detection in Jupyter.
* **(May 31, 2023)**
* Inference demo for image classification in Google Colab.
* **(Mar 22, 2023)**
* Codes for prompt pretraining (POMP) on ImageNet-21K, cross-dataset and cross-task evaluation.
* Checkpoints of pre-trained POMP prompts, segmentation backbones, and detection backbones.
## Highlights

## Main Contributions
1) We introduce a prompt pre-training method POMP, which fisrt enables prompt learning on large-scale datasets like ImageNet-21K with over twenty-thousand classes.
2) POMP is memory and computation efficient. Compared with previous methods like CoOp, it achieves comparable accuracy on ImageNet-1K with only 19\% GPU memory and 50\% training time.
3) POMP achieves new SOTAs on various open-vocabulary visual recognition datasets and tasks.
## Installation
For installation and other package requirements, please follow the instructions detailed in [INSTALL.md](docs/INSTALL.md).
## Data preparation
Please follow the instructions at [DATASETS.md](docs/DATASETS.md) to prepare all datasets.
## Pre-trained Models
Please follow the instructions at [MODELS.md](docs/MODELS.md) to prepare all pre-trained models.
## Training and Evaluation
Please refer to the [RUN.md](docs/RUN.md) for detailed instructions on training, evaluating and reproducing the results.
## Contact
If you have any questions, please feel free to create an issue on this repository.
## Citation
If you find this code useful for your research, please consider citing:
```
@article{ren2023pomp,
title={Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition},
author={Ren, Shuhuai and Zhang, Aston and Zhu, Yi and Zhang, Shuai and Zheng, Shuai and Li, Mu and Smola, Alex and Sun, Xu},
journal={arXiv preprint arXiv:2304.04704},
year={2023}
}
```
## Acknowledgements
Our code is based on [CoOp](https://github.com/KaiyangZhou/CoOp), [MaPLe](https://github.com/muzairkhattak/multimodal-prompt-learning), [Dassl](https://github.com/KaiyangZhou/Dassl.pytorch), [Detic](https://github.com/facebookresearch/Detic) and [ZSSeg](https://github.com/MendelXu/zsseg.baseline) repositories. We thank the authors for releasing their code.