Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jdai-cv/image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
https://github.com/jdai-cv/image-captioning
image-captioning vision-and-language
Last synced: 2 days ago
JSON representation
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
- Host: GitHub
- URL: https://github.com/jdai-cv/image-captioning
- Owner: JDAI-CV
- Created: 2020-03-26T09:33:13.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-07-27T06:06:31.000Z (over 3 years ago)
- Last Synced: 2025-01-08T21:14:10.023Z (9 days ago)
- Topics: image-captioning, vision-and-language
- Language: Python
- Homepage:
- Size: 733 KB
- Stars: 273
- Watchers: 4
- Forks: 55
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Introduction
This repository is for **X-Linear Attention Networks for Image Captioning** (CVPR 2020). The original paper can be found [here](https://arxiv.org/pdf/2003.14080.pdf).Please cite with the following BibTeX:
```
@inproceedings{xlinear2020cvpr,
title={X-Linear Attention Networks for Image Captioning},
author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2020}
}
```
## Requirements
* Python 3
* CUDA 10
* numpy
* tqdm
* easydict
* [PyTorch](http://pytorch.org/) (>1.0)
* [torchvision](http://pytorch.org/)
* [coco-caption](https://github.com/ruotianluo/coco-caption)## Data preparation
1. Download the [bottom up features](https://github.com/peteanderson80/bottom-up-attention) and convert them to npz files
```
python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100
```2. Download the [annotations](https://drive.google.com/open?id=1i5YJRSZtpov0nOtRyfM0OS1n0tPCGiCS) into the mscoco folder. More details about data preparation can be referred to [self-critical.pytorch](https://github.com/ruotianluo/self-critical.pytorch)
3. Download [coco-caption](https://github.com/ruotianluo/coco-caption) and setup the path of __C.INFERENCE.COCO_PATH in lib/config.py
4. The pretrained models and results can be downloaded [here](https://drive.google.com/open?id=1a7aINHtpQbIw5JbAc4yvC7I1V-tQSdzb).
5. The pretrained SENet-154 model can be downloaded [here](https://drive.google.com/file/d/1CrWJcdKLPmFYVdVNcQLviwKGtAREjarR/view?usp=sharing).
## Training
### Train X-LAN model
```
bash experiments/xlan/train.sh
```### Train X-LAN model using self critical
Copy the pretrained model into experiments/xlan_rl/snapshot and run the script
```
bash experiments/xlan_rl/train.sh
```### Train X-LAN transformer model
```
bash experiments/xtransformer/train.sh
```### Train X-LAN transformer model using self critical
Copy the pretrained model into experiments/xtransformer_rl/snapshot and run the script
```
bash experiments/xtransformer_rl/train.sh
```## Evaluation
```
CUDA_VISIBLE_DEVICES=0 python3 main_test.py --folder experiments/model_folder --resume model_epoch
```## Acknowledgements
Thanks the contribution of [self-critical.pytorch](https://github.com/ruotianluo/self-critical.pytorch) and awesome PyTorch team.