https://github.com/parham1998/enhancing-high-vocabulary-ia-with-a-novel-attention-based-pooling
Official Pytorch Implementation of: "Enhancing High-Vocabulary Image Annotation with a Novel Attention-Based Pooling"
https://github.com/parham1998/enhancing-high-vocabulary-ia-with-a-novel-attention-based-pooling
asymmetric-loss convolutional-neural-networks corel-5k esp-game high-vocabulary-image-annotation iapr-tc-12 image-annotation pytorch transformer-decoder transformers tresnet vg-500
Last synced: about 1 month ago
JSON representation
Official Pytorch Implementation of: "Enhancing High-Vocabulary Image Annotation with a Novel Attention-Based Pooling"
- Host: GitHub
- URL: https://github.com/parham1998/enhancing-high-vocabulary-ia-with-a-novel-attention-based-pooling
- Owner: parham1998
- Created: 2023-08-17T11:15:12.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-30T09:16:48.000Z (7 months ago)
- Last Synced: 2025-03-29T02:51:28.898Z (about 2 months ago)
- Topics: asymmetric-loss, convolutional-neural-networks, corel-5k, esp-game, high-vocabulary-image-annotation, iapr-tc-12, image-annotation, pytorch, transformer-decoder, transformers, tresnet, vg-500
- Language: Python
- Homepage:
- Size: 3.27 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling
Official Pytorch Implementation of: "[Enhancing High-Vocabulary Image Annotation with a Novel Attention-Based Pooling](https://doi.org/10.1007/s00371-024-03618-6)"## Datasets
Three well-known datasets are mostly used in AIA tasks. In addition, we have utilized a dataset with a significantly larger number of images and a vocabulary list consisting of 500 words, which has a very high level of complexity. The table below provides details about these datasets. It is also possible to download them from the links provided. (After downloading each dataset, replace its 'images' folder with the corresponding 'images' folder in the 'datasets' folder).| *Dataset* | *Num of images* | *Num of training images* | *Num of testing images* | *Num of vocabularies* | *Labels per image* | *Image per label* |
| :------------: | :-------------: | :-------------: | :-------------: | :------------: | :-------------: | :-------------: |
| [Corel 5k](https://www.kaggle.com/datasets/parhamsalar/corel5k) | 5,000 | 4,500 | 500 | 260 | 3.4 | 58.6 |
| [ESP Game](https://www.kaggle.com/datasets/parhamsalar/espgame) | 20,770 | 18,689 | 2081 | 268 | 4.7 | 362.7 |
| [IAPR TC-12](https://www.kaggle.com/datasets/parhamsalar/iaprtc12) | 19,627 | 17,665 | 1962 | 291 | 5.7 | 347.7 |
| [VG-500](https://visualgenome.org/) | 92,904 | 82,904 | 10,000 | 500 | 13.6 | 2256.6 |We employed the [SSGRL](https://github.com/HCPLab-SYSU/SSGRL) settings when working with the VG 500 dataset, which involves selecting images from the 500 most common categories and then dividing the data into training and testing subsets. We also attempted to identify the names of labels (vocabulary) for the mentioned dataset. Please let us know if there are any errors.
## model
## Attention Maps
## Train and Evaluation
To train the model in Spyder IDE use the code below:
```python
run main.py --data {select training dataset} --loss-function {select loss function}
```
Please note that:
1) You should put **Corel-5k**, **ESP-Game**, **IAPR-TC-12**, or **VG-500** in {select training dataset}.2) You should put the **proposedLoss** in {select loss function}.
3) When using the **VG-500** dataset, change the "image-size" to 576, change the "gamma_neg" in **proposedLoss** to 2, and set batch size to 128.
To evaluate the model in Spyder IDE use the code below:
```python
run main.py --data {select training dataset} --loss-function {select loss function} --evaluate
```## Results
Proposed method:
| data | precision | recall | f1-score | N+ | mAP |
| :------------: | :------------: | :------------: | :------------: | :------------: | :------------: |
Corel 5k | 0.453 | 0.611 | **0.520** | **202** | - |
IAPR TC-12 | 0.515 | 0.584 | **0.547** | **287** | - |
ESP Game | 0.442 | 0.500 | **0.470** | **262** | - |
VG-500 | 0.409 | 0.502 | **0.451** | **477** | 42.515 |## Citation
Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:
```
@article{salar2024enhancing,
title={Enhancing high-vocabulary image annotation with a novel attention-based pooling},
author={Salar, Ali and Ahmadi, Ali},
journal={The Visual Computer},
pages={1--15},
year={2024},
publisher={Springer}
}
```## Contact
I would be happy to answer any questions you may have - Ali Salar ([email protected])