Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xmu-xiaoma666/dtnet
The official repository for “Image Captioning via Dynamic Path Customization”.
https://github.com/xmu-xiaoma666/dtnet
Last synced: about 1 month ago
JSON representation
The official repository for “Image Captioning via Dynamic Path Customization”.
- Host: GitHub
- URL: https://github.com/xmu-xiaoma666/dtnet
- Owner: xmu-xiaoma666
- License: mit
- Created: 2023-09-27T12:58:44.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-11-07T10:54:30.000Z (about 2 months ago)
- Last Synced: 2024-11-07T11:39:18.723Z (about 2 months ago)
- Language: Python
- Size: 3.16 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Image Captioning via Dynamic Path Customization
## Introduction
The official repository for “Image Captioning via Dynamic Path Customization”.Dynamic Transformer Network (DTNet) is a model to genrate discriminative yet accurate captions, which dynamically assigns customized paths to different samples.
The framework of the proposed Dynamic Transformer Network (DTNet)
The detailed architectures of different cells in the spatial and channel routing space.## News
- 2023.09.28: Released code
## Environment setup
Please refer to [meshed-memory-transformer](https://github.com/aimagelab/meshed-memory-transformer)
## Data preparation
* **Annotation**. Download the annotation file [annotation.zip](https://drive.google.com/file/d/1i8mqKFKhqvBr8kEp3DbIh9-9UNAfKGmE/view?usp=sharing). Extarct and put it in the project root directory.
* **Feature**. You can download our ResNeXt-101 feature (hdf5 file) [here](https://pan.baidu.com/s/1xVZO7t8k4H_l3aEyuA-KXQ). Acess code: jcj6.
* **evaluation**. Download the evaluation tools [here](https://pan.baidu.com/s/1xVZO7t8k4H_l3aEyuA-KXQ). Acess code: jcj6. Extarct and put it in the project root directory.There are five kinds of keys in our .hdf5 file. They are
* `['%d_features' % image_id]`: region features (N_regions, feature_dim)
* `['%d_boxes' % image_id]`: bounding box of region features (N_regions, 4)
* `['%d_size' % image_id]`: size of original image (for normalizing bounding box), (2,)
* `['%d_grids' % image_id]`: grid features (N_grids, feature_dim)
* `['%d_mask' % image_id]`: geometric alignment graph, (N_regions, N_grids)The feature extraction can be followed as [here](https://github.com/luo3300612/image-captioning-DLCT/tree/main)
## Training
```python
python train.py --exp_name DTNet --batch_size 50 --rl_batch_size 100 --workers 4 --head 8 --warmup 10000 --features_path /home/data/coco_grid_feats2.hdf5 --annotation /home/data/m2_annotations --logs_folder tensorboard_logs
```
## Evaluation
```python
python eval.py --batch_size 50 --exp_name DTNet --features_path /home/data/coco_grid_feats2.hdf5 --annotation /home/data/m2_annotations --ckpt_path your_model_path
```## Performance
Comparisons with SOTAs on the Karpathy test split.## Qualitative Results
Examples of captions generated by Transformer and DTNet.
Images and the corresponding number of passed cells.
Path Visualization.## Acknowledgements
- Thanks the [meshed-memory-transformer](https://github.com/aimagelab/meshed-memory-transformer).
- Thanks the amazing work of [grid-feats-vqa](https://github.com/facebookresearch/grid-feats-vqa).## Citations
```
@ARTICLE{ma2024image,
author={Ma, Yiwei and Ji, Jiayi and Sun, Xiaoshuai and Zhou, Yiyi and Hong, Xiaopeng and Wu, Yongjian and Ji, Rongrong},
journal={IEEE Transactions on Neural Networks and Learning Systems},
title={Image Captioning via Dynamic Path Customization},
year={2024},
volume={},
number={},
pages={1-15},
keywords={Routing;Visualization;Transformers;Adaptation models;Task analysis;Feature extraction;Semantics;Dynamic network;image captioning;input-sensitive;transformer},
doi={10.1109/TNNLS.2024.3409354}}
```