https://github.com/aimagelab/pma-net

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
https://github.com/aimagelab/pma-net

captioning captioning-images iccv2023 image-captioning memory-augmented-neural-networks transformer vision-and-language vision-language

Last synced: 12 months ago
JSON representation

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023

Host: GitHub
URL: https://github.com/aimagelab/pma-net
Owner: aimagelab
License: bsd-3-clause
Created: 2023-08-03T08:06:06.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-06-07T08:52:03.000Z (about 2 years ago)
Last Synced: 2025-04-11T14:43:16.969Z (over 1 year ago)
Topics: captioning, captioning-images, iccv2023, image-captioning, memory-augmented-neural-networks, transformer, vision-and-language, vision-language
Language: Python
Homepage:
Size: 5.34 MB
Stars: 17
Watchers: 8
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


  PMA-Net: Prototypical Memory Attention Network
(ICCV 2023)

  



This repository contains the reference code for the paper [With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning](https://arxiv.org/abs/2308.12383).

Please cite with the following BibTeX:

```

@inproceedings{sarto2023positive,

  title={{With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning}},

  author={Barraco, Manuele and Sarto, Sara and Cornia, Marcella and Baraldi, Lorenzo and Cucchiara, Rita},

  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},

  year={2023}

}

```



  

 

## Environment Setup

Clone the repository and create the `pma-net` conda environment using the `environment.yml` file:

```

conda env create -f environment.yml

conda activate pma-net

```

Note: Python 3.9 is required to run our code. 

## Data Preparation

### Checkpoints

XE and SCST checkpoints are available at the following links:

| **Model**       | **Checkpoint**         |

| -------------- | -------------      |

| **PMA-Net XE**  | [pma-net_xe.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/pma-net_xe.tar)  |

| **PMA-Net SCST**  |  [pma-net_scst.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/pma-net_scst.tar) |

Download, extract, and place them in a folder anywhere. The path `{CHECKPOINT_FOLDER}` will be set as argument later.

### Dataset

To run the code, annotations for the COCO dataset are needed.

Please download the zip files containing the annotations ([annotations.zip](https://aimagelab.ing.unimore.it/go/coco_annotations)), extract them, and place them under the ```datasets/annotations``` folder.

To train and test our model, download the tar files containing the already extracted COCO image features using CLIP ViT-L/14 at the following links:

| **Split**       | **Checkpoint**         | 

| -------------- | -------------      |

| **COCO Training (chunck 0)**  | [coco_training_CLIP-ViT-L14_cached_0.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_training_CLIP-ViT-L14_cached_0.tar)  |

| **COCO Training (chunck 1)**  | [coco_training_CLIP-ViT-L14_cached_1.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_training_CLIP-ViT-L14_cached_1.tar)  |

| **COCO Training (chunck 2)**  | [coco_training_CLIP-ViT-L14_cached_2.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_training_CLIP-ViT-L14_cached_2.tar)  |

| **COCO Training (chunck 3)**  | [coco_training_CLIP-ViT-L14_cached_3.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_training_CLIP-ViT-L14_cached_3.tar)  |

| **COCO Training (chunck 4)**  | [coco_training_CLIP-ViT-L14_cached_4.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_training_CLIP-ViT-L14_cached_4.tar)  |

| **COCO Training (chunck 5)**  | [coco_training_CLIP-ViT-L14_cached_5.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_training_CLIP-ViT-L14_cached_5.tar)  |

| **COCO Training for SCST**  |  [coco_training_dict_CLIP-ViT-L14_cached.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_training_dict_CLIP-ViT-L14_cached.tar) |

| **COCO Validation**  |  [coco_validation_dict_CLIP-ViT-L14_cached.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_validation_dict_CLIP-ViT-L14_cached.tar) |

| **COCO Test**  |  [coco_test_dict_CLIP-ViT-L14_cached.tar](https://ailb-web.ing.unimore.it/publicfiles/pma-net_iccv2023/coco_test_dict_CLIP-ViT-L14_cached.tar) |

Once the files are downloaded and extracted in a single folder, set the correct path in the ```configs/datasets/datasets.json```. 

These paths will be set as arguments later.

## Evaluation

To evaluate our best model, use

```

torchrun --nproc_per_node {N_GPUS} --master_port {MASTER_PORT} main.py --do_eval --do_predict --predict_with_generate --output_dir {OUTPUT_DIR} --validation_datasets coco_validation_dict_CLIP-ViT-L14_cached --test_datasets coco_test_dict_CLIP-ViT-L14_cached --evaluation_strategy steps --generation_max_length 30 --generation_num_beams 5 --per_device_eval_batch_size {EVAL_BATCH_SIZE} --kmeans_memory --add_memory_slots_selfattn --n_memory_slots 1024 --deque_iters 1500 --window 0.25 --resume_from_checkpoint {CHECKPOINT_FOLDER}

```

## Training Procedure

To train our best model with the parameters used in our experiments, use

```

torchrun --nproc_per_node {N_GPUS} --master_port {MASTER_PORT} main.py --do_train --do_eval --do_predict --predict_with_generate --output_dir {OUTPUT_DIR} --train_datasets coco_CLIP-ViT-L14_cached --validation_datasets coco_validation_dict_CLIP-ViT-L14_cached --test_datasets coco_test_dict_CLIP-ViT-L14_cached --evaluation_strategy steps 

--eval_steps 1000 --save_steps 1000 --max_steps -1 --generation_max_length 30 --generation_num_beams 5 --per_device_train_batch_size {TRAIN_BATCH_SIZE} --per_device_eval_batch_size {EVAL_BATCH_SIZE} --custom_lr_scheduler CustomScheduler --steps_min 15000 --start_decreasing_steps 10000 --learning_rate 2.5e-4 --warmup_steps 1000 --lr_min 1e-5 --gradient_accumulation_steps 8 --deepspeed configs/deepspeed/config_lamb_zero2.json --encoder --kmeans_memory --add_memory_slots_selfattn --n_memory_slots 1024 --deque_iters 1500 --window 0.25 

```

After XE pre-training, for the SCST step use:

```

torchrun --nproc_per_node {N_GPUS} --master_port {MASTER_PORT} main.py --do_train --do_eval --do_predict --predict_with_generate --output_dir {OUTPUT_DIR} --train_datasets coco_training_dict_CLIP-ViT-L14_cached --validation_datasets coco_validation_dict_CLIP-ViT-L14_cached --test_datasets coco_test_dict_CLIP-ViT-L14_cached --evaluation_strategy steps 

--eval_steps 1000 --save_steps 1000 --max_steps -1 --generation_max_length 30 --generation_num_beams 5 --per_device_train_batch_size {TRAIN_BATCH_SIZE} --per_device_eval_batch_size {EVAL_BATCH_SIZE} --steps_min 15000  --learning_rate 5e-6 --gradient_accumulation_steps 8 --deepspeed configs/deepspeed/config_adam_zero2.json --encoder --kmeans_memory --add_memory_slots_selfattn --n_memory_slots 1024 --deque_iters 1500 --window 0.25 --scst --resume_from_checkpoint {CHECKPOINT_FOLDER}

```

## Custom Arguments

The complete arguments list for using our code:

| Argument | Description |

|------|------|

|`--encoder` | Add a BERT encoder. |

|`--n_layer` | Number of layer. |

|`--n_embd` | Embedding dimension. |

|`--n_head` | Number of head. |

|`--custom_checkpoint_keeper` | How many checkpoints keep on drive, default is `5`. |

|`--scst` | Use SCST phase. |

|`--train_datasets` | Training datasets, default is `coco_training`. | 

|`--validation_datasets` | Validation datasets, default is `coco_validation_dict`. |

|`--test_datasets` | Test datasets, default is `coco_test_dict`. |

|`--scst_datasets` | SCST datasets, default is `coco_training_dict`. |

|`--custom_lr_scheduler` | Which custom scheduler uses (`CustomScheduler`, `TransformerScheduler`), default is `None`. |

|`--lr_multiplier` | Learning rate multiplier, default is `1.0`. |

|`--steps_min` | Only with `CustomScheduler`. |

|`--lr_min` | Only with `CustomScheduler`. |

|`--start_decreasing_steps` | Only with `CustomScheduler`. |

|`--add_memory_slots_selfattn` | Add memory slots in the self-attention blocks. |

|`--n_memory_slots` | How many memory slots, default is `64`. |

|`--freeze_memory` | Freeze the memories. |

|`--kmeans_memory` | Compute the memories using k-means. |

|`--deque_iters` | Max number of iterations data in the deque, default is `10`. |

|`--window` | Overlap window of new data, default is `None`. |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aimagelab/pma-net

Awesome Lists containing this project

README

PMA-Net: Prototypical Memory Attention Network
(ICCV 2023)

https://github.com/aimagelab/pma-net

Awesome Lists containing this project

README

PMA-Net: Prototypical Memory Attention Network(ICCV 2023)

PMA-Net: Prototypical Memory Attention Network
(ICCV 2023)