Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aimagelab/meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
https://github.com/aimagelab/meshed-memory-transformer
caption-generation captioning-images cvpr2020 image-captioning pytorch transformer visual-semantic
Last synced: 2 days ago
JSON representation
Meshed-Memory Transformer for Image Captioning. CVPR 2020
- Host: GitHub
- URL: https://github.com/aimagelab/meshed-memory-transformer
- Owner: aimagelab
- License: bsd-3-clause
- Created: 2019-12-12T18:02:41.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-21T15:22:39.000Z (about 2 years ago)
- Last Synced: 2025-01-18T15:13:27.066Z (9 days ago)
- Topics: caption-generation, captioning-images, cvpr2020, image-captioning, pytorch, transformer, visual-semantic
- Language: Python
- Homepage:
- Size: 7.07 MB
- Stars: 522
- Watchers: 13
- Forks: 136
- Open Issues: 49
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# M²: Meshed-Memory Transformer
This repository contains the reference code for the paper _[Meshed-Memory Transformer for Image Captioning](https://arxiv.org/abs/1912.08226)_ (CVPR 2020).Please cite with the following BibTeX:
```
@inproceedings{cornia2020m2,
title={{Meshed-Memory Transformer for Image Captioning}},
author={Cornia, Marcella and Stefanini, Matteo and Baraldi, Lorenzo and Cucchiara, Rita},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2020}
}
```
## Environment setup
Clone the repository and create the `m2release` conda environment using the `environment.yml` file:
```
conda env create -f environment.yml
conda activate m2release
```Then download spacy data by executing the following command:
```
python -m spacy download en
```Note: Python 3.6 is required to run our code.
## Data preparation
To run the code, annotations and detection features for the COCO dataset are needed. Please download the annotations file [annotations.zip](https://ailb-web.ing.unimore.it/publicfiles/drive/meshed-memory-transformer/annotations.zip) and extract it.Detection features are computed with the code provided by [1]. To reproduce our result, please download the COCO features file [coco_detections.hdf5](https://ailb-web.ing.unimore.it/publicfiles/drive/show-control-and-tell/coco_detections.hdf5) (~53.5 GB), in which detections of each image are stored under the `_features` key. `` is the id of each COCO image, without leading zeros (e.g. the `` for `COCO_val2014_000000037209.jpg` is `37209`), and each value should be a `(N, 2048)` tensor, where `N` is the number of detections.
## Evaluation
To reproduce the results reported in our paper, download the pretrained model file [meshed_memory_transformer.pth](https://ailb-web.ing.unimore.it/publicfiles/drive/meshed-memory-transformer/meshed_memory_transformer.pth) and place it in the code folder.Run `python test.py` using the following arguments:
| Argument | Possible values |
|------|------|
| `--batch_size` | Batch size (default: 10) |
| `--workers` | Number of workers (default: 0) |
| `--features_path` | Path to detection features file |
| `--annotation_folder` | Path to folder with COCO annotations |#### Expected output
Under `output_logs/`, you may also find the expected output of the evaluation code.## Training procedure
Run `python train.py` using the following arguments:| Argument | Possible values |
|------|------|
| `--exp_name` | Experiment name|
| `--batch_size` | Batch size (default: 10) |
| `--workers` | Number of workers (default: 0) |
| `--m` | Number of memory vectors (default: 40) |
| `--head` | Number of heads (default: 8) |
| `--warmup` | Warmup value for learning rate scheduling (default: 10000) |
| `--resume_last` | If used, the training will be resumed from the last checkpoint. |
| `--resume_best` | If used, the training will be resumed from the best checkpoint. |
| `--features_path` | Path to detection features file |
| `--annotation_folder` | Path to folder with COCO annotations |
| `--logs_folder` | Path folder for tensorboard logs (default: "tensorboard_logs")|For example, to train our model with the parameters used in our experiments, use
```
python train.py --exp_name m2_transformer --batch_size 50 --m 40 --head 8 --warmup 10000 --features_path /path/to/features --annotation_folder /path/to/annotations
```
#### References
[1] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. In _Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition_, 2018.