https://github.com/aimagelab/camel

CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
https://github.com/aimagelab/camel

artificial-intelligence captioning captioning-images computer-vision image-captioning pytorch

Last synced: over 1 year ago
JSON representation

CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022

Host: GitHub
URL: https://github.com/aimagelab/camel
Owner: aimagelab
License: bsd-3-clause
Created: 2022-01-24T13:42:30.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2022-12-01T10:27:57.000Z (over 3 years ago)
Last Synced: 2025-04-08T16:35:59.343Z (over 1 year ago)
Topics: artificial-intelligence, captioning, captioning-images, computer-vision, image-captioning, pytorch
Language: Python
Homepage:
Size: 8.46 MB
Stars: 29
Watchers: 4
Forks: 12
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # CaMEL: Mean Teacher Learning for Image Captioning

This repository contains the reference code for the paper _[CaMEL: Mean Teacher Learning for Image Captioning](https://arxiv.org/pdf/2202.10492.pdf)_.

Please cite with the following BibTeX:

```

@inproceedings{barraco2022camel,

  title={{CaMEL: Mean Teacher Learning for Image Captioning}},

  author={Barraco, Manuele and Stefanini, Matteo and Cornia, Marcella and Cascianelli, Silvia and Baraldi, Lorenzo and Cucchiara, Rita},

  booktitle={International Conference on Pattern Recognition},

  year={2022}

}

```







## Environment setup

Clone the repository and create the `camel_release` conda environment using the `environment.yml` file:

```

conda env create -f environment.yml

conda activate camel_release

```

Note: Python 3.8 is required to run our code. 







## Data preparation

To run the code, annotations and images for the COCO dataset are needed.

Please download the zip files containing the images ([train2014.zip](http://images.cocodataset.org/zips/train2014.zip), [val2014.zip](http://images.cocodataset.org/zips/val2014.zip)), and the annotations ([annotations.zip](https://aimagelab.ing.unimore.it/go/coco_annotations)) and extract them. 

These paths will be set as arguments later.

## Evaluation

To reproduce the results reported in our paper, download the pretrained model file [camel_mesh.pth](https://aimagelab.ing.unimore.it/go/camel_mesh.pth) or [camel_nomesh.pth](https://aimagelab.ing.unimore.it/go/camel_nomesh.pth) and place it anywhere. Its path will be set as argument later.

Run `python evaluation.py` using the following arguments:

| Argument | Possible values |

|------|------|

| `--batch_size` | Batch size (default: `25`) |

| `--workers` | Number of workers (default: `0`) |

| `--resume_last` | If used, the training will be resumed from the last checkpoint |

| `--resume_best` | If used, the training will be resumed from the best checkpoint |

| `--annotation_folder` | Path to folder with COCO annotations (required) |

| `--image_folder` | Path to folder with COCO images (required) |

| `--saved_model_path` | Path to model weights file (required) |

| `--clip_variant` | CLIP variant to be used as image encoder (default: `RN50x16`) |

| `--network` | Network to be used in the evaluation, `online` or `target` (default: `target`) |

| `--disable_mesh` | If used, the model does not employ the mesh connectivity |

| `--N_dec` | Number of decoder layers (default: `3`) |

| `--N_enc` | Number of encoder layers (default: `3`) |

| `--d_model` | Dimensionality of the model (default: `512`) |

| `--d_ff` | Dimensionality of Feed-Forward layers (default: `2048`) |

| `--m` | Number of memory vectors (default: `40`) |

| `--head` | Number of heads (default: `8`) |

For example, to evaluate our model, use

```

python evaluate.py --image_folder /path/to/images --annotation_folder /path/to/annotations --saved_model_path /path/to/model_file.pth

```

## Training procedure

Run `python train.py` using the following arguments:

| Argument | Possible values |

|------|------|

| `--exp_name` | Experiment name (default: `camel`) |

| `--batch_size` | Batch size (default: `25`) |

| `--workers` | Number of workers (default: `0`) |

| `--resume_last` | If used, the training will be resumed from the last checkpoint |

| `--resume_best` | If used, the training will be resumed from the best checkpoint |

| `--annotation_folder` | Path to folder with COCO annotations (required) |

| `--image_folder` | Path to folder with COCO images (required) |

| `--clip_variant` | CLIP variant to be used as image encoder (default: `RN50x16`) |

| `--distillation_weight` | Weight for the knowledge distillation loss (default: `0.1` in XE phase, `0.005` in SCST phase) |

| `--ema_weight` | Target decay rate of Mean Teacher paradigm (default: `0.999`) |

| `--phase` | Training phase, `xe` or `scst` (default: `xe`) |

| `--disable_mesh` | If used, the model does not employ the mesh connectivity |

| `--saved_model_file` | If used, path to model weights to be loaded |

| `--N_dec` | Number of decoder layers (default: `3`) |

| `--N_enc` | Number of encoder layers (default: `3`) |

| `--d_model` | Dimensionality of the model (default: `512`) |

| `--d_ff` | Dimensionality of Feed-Forward layers (default: `2048`) |

| `--m` | Number of memory vectors (default: `40`) |

| `--head` | Number of heads (default: `8`) |

| `--warmup` | Warmup value for learning rate scheduling (default: `10000`) |

For example, to train our model with the parameters used in our experiments, use

```

python train.py --image_folder /path/to/images --annotation_folder /path/to/annotations

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aimagelab/camel

Awesome Lists containing this project

README