Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vision-cair/visualgpt
VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models
https://github.com/vision-cair/visualgpt
data-efficient-image-caption image-caption visualgpt
Last synced: 3 days ago
JSON representation
VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models
- Host: GitHub
- URL: https://github.com/vision-cair/visualgpt
- Owner: Vision-CAIR
- License: mit
- Created: 2021-02-15T08:45:53.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-05-16T06:13:12.000Z (over 1 year ago)
- Last Synced: 2024-12-24T08:07:57.000Z (10 days ago)
- Topics: data-efficient-image-caption, image-caption, visualgpt
- Language: Python
- Homepage:
- Size: 6.18 MB
- Stars: 322
- Watchers: 14
- Forks: 50
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VisualGPT
Our Paper [VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning](https://arxiv.org/abs/2102.10407)
## Main Architecture of Our VisualGPT
![image](images/final_architecture.jpg)## Download the GPT-2 pretrained weights
```
curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
```## Enviroment setup
Clone the repository and create the `visualgpt` conda environmnet```
conda env create -f environment.yml
conda activate visualgpt
```Then download spacy data
```
python -m spacy download en
```## Data preparation
We provide the COCO dataset for downloading. Please download the annotations file [annotations.zip](https://drive.google.com/file/d/1i8mqKFKhqvBr8kEp3DbIh9-9UNAfKGmE/view?usp=sharing) and extract it.
and [coco_detections.hdf5](https://drive.google.com/open?id=1MV6dSnqViQfyvgyHrmAT_lLpFbkzp3mx), in which the data is stored in a `` where key is the image id and value is a tensor (N, 2048). N it the number of detections## code structure
create the log folder ``mkdir logs`` and start the training
## Train the model
```
python train_visualGPT.py --batch_size 50 --head 12 --tau 0.2 --features_path coco_detections.hdf5 --annotation_folder annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --lr 1e-4 --decoder_layer 12 --optimizer_type adamw --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data
```## Acknowledgement
This code used resources from [Meshed Memory Transformer](https://github.com/aimagelab/meshed-memory-transformer) and [Transformers](https://github.com/huggingface/transformers)Please cite our paper from the following bibtex
```
@@InProceedings{Chen_2022_CVPR,
author = {Chen, Jun and Guo, Han and Yi, Kai and Li, Boyang and Elhoseiny, Mohamed},
title = {VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {18030-18040}
}```