https://github.com/aimagelab/camel
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
https://github.com/aimagelab/camel
artificial-intelligence captioning captioning-images computer-vision image-captioning pytorch
Last synced: about 1 year ago
JSON representation
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
- Host: GitHub
- URL: https://github.com/aimagelab/camel
- Owner: aimagelab
- License: bsd-3-clause
- Created: 2022-01-24T13:42:30.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-12-01T10:27:57.000Z (over 3 years ago)
- Last Synced: 2025-04-08T16:35:59.343Z (about 1 year ago)
- Topics: artificial-intelligence, captioning, captioning-images, computer-vision, image-captioning, pytorch
- Language: Python
- Homepage:
- Size: 8.46 MB
- Stars: 29
- Watchers: 4
- Forks: 12
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CaMEL: Mean Teacher Learning for Image Captioning
This repository contains the reference code for the paper _[CaMEL: Mean Teacher Learning for Image Captioning](https://arxiv.org/pdf/2202.10492.pdf)_.
Please cite with the following BibTeX:
```
@inproceedings{barraco2022camel,
title={{CaMEL: Mean Teacher Learning for Image Captioning}},
author={Barraco, Manuele and Stefanini, Matteo and Cornia, Marcella and Cascianelli, Silvia and Baraldi, Lorenzo and Cucchiara, Rita},
booktitle={International Conference on Pattern Recognition},
year={2022}
}
```
## Environment setup
Clone the repository and create the `camel_release` conda environment using the `environment.yml` file:
```
conda env create -f environment.yml
conda activate camel_release
```
Note: Python 3.8 is required to run our code.
## Data preparation
To run the code, annotations and images for the COCO dataset are needed.
Please download the zip files containing the images ([train2014.zip](http://images.cocodataset.org/zips/train2014.zip), [val2014.zip](http://images.cocodataset.org/zips/val2014.zip)), and the annotations ([annotations.zip](https://aimagelab.ing.unimore.it/go/coco_annotations)) and extract them.
These paths will be set as arguments later.
## Evaluation
To reproduce the results reported in our paper, download the pretrained model file [camel_mesh.pth](https://aimagelab.ing.unimore.it/go/camel_mesh.pth) or [camel_nomesh.pth](https://aimagelab.ing.unimore.it/go/camel_nomesh.pth) and place it anywhere. Its path will be set as argument later.
Run `python evaluation.py` using the following arguments:
| Argument | Possible values |
|------|------|
| `--batch_size` | Batch size (default: `25`) |
| `--workers` | Number of workers (default: `0`) |
| `--resume_last` | If used, the training will be resumed from the last checkpoint |
| `--resume_best` | If used, the training will be resumed from the best checkpoint |
| `--annotation_folder` | Path to folder with COCO annotations (required) |
| `--image_folder` | Path to folder with COCO images (required) |
| `--saved_model_path` | Path to model weights file (required) |
| `--clip_variant` | CLIP variant to be used as image encoder (default: `RN50x16`) |
| `--network` | Network to be used in the evaluation, `online` or `target` (default: `target`) |
| `--disable_mesh` | If used, the model does not employ the mesh connectivity |
| `--N_dec` | Number of decoder layers (default: `3`) |
| `--N_enc` | Number of encoder layers (default: `3`) |
| `--d_model` | Dimensionality of the model (default: `512`) |
| `--d_ff` | Dimensionality of Feed-Forward layers (default: `2048`) |
| `--m` | Number of memory vectors (default: `40`) |
| `--head` | Number of heads (default: `8`) |
For example, to evaluate our model, use
```
python evaluate.py --image_folder /path/to/images --annotation_folder /path/to/annotations --saved_model_path /path/to/model_file.pth
```
## Training procedure
Run `python train.py` using the following arguments:
| Argument | Possible values |
|------|------|
| `--exp_name` | Experiment name (default: `camel`) |
| `--batch_size` | Batch size (default: `25`) |
| `--workers` | Number of workers (default: `0`) |
| `--resume_last` | If used, the training will be resumed from the last checkpoint |
| `--resume_best` | If used, the training will be resumed from the best checkpoint |
| `--annotation_folder` | Path to folder with COCO annotations (required) |
| `--image_folder` | Path to folder with COCO images (required) |
| `--clip_variant` | CLIP variant to be used as image encoder (default: `RN50x16`) |
| `--distillation_weight` | Weight for the knowledge distillation loss (default: `0.1` in XE phase, `0.005` in SCST phase) |
| `--ema_weight` | Target decay rate of Mean Teacher paradigm (default: `0.999`) |
| `--phase` | Training phase, `xe` or `scst` (default: `xe`) |
| `--disable_mesh` | If used, the model does not employ the mesh connectivity |
| `--saved_model_file` | If used, path to model weights to be loaded |
| `--N_dec` | Number of decoder layers (default: `3`) |
| `--N_enc` | Number of encoder layers (default: `3`) |
| `--d_model` | Dimensionality of the model (default: `512`) |
| `--d_ff` | Dimensionality of Feed-Forward layers (default: `2048`) |
| `--m` | Number of memory vectors (default: `40`) |
| `--head` | Number of heads (default: `8`) |
| `--warmup` | Warmup value for learning rate scheduling (default: `10000`) |
For example, to train our model with the parameters used in our experiments, use
```
python train.py --image_folder /path/to/images --annotation_folder /path/to/annotations
```