https://github.com/dinhanhx/visualroberta
The first public Vietnamese visual linguistic foundation model(s)
https://github.com/dinhanhx/visualroberta
image-captioning image-text python python-3 python3 vietnamese-nlp visual-linguistic visual-question-answering
Last synced: 5 months ago
JSON representation
The first public Vietnamese visual linguistic foundation model(s)
- Host: GitHub
- URL: https://github.com/dinhanhx/visualroberta
- Owner: dinhanhx
- License: mit
- Created: 2022-08-23T08:34:08.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-29T10:40:04.000Z (almost 2 years ago)
- Last Synced: 2025-05-13T00:46:34.605Z (5 months ago)
- Topics: image-captioning, image-text, python, python-3, python3, vietnamese-nlp, visual-linguistic, visual-question-answering
- Language: Python
- Homepage:
- Size: 98.6 KB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
REFACTOR IN PROCESS
===No I'm serious. Don't touch this.
# VisualRoBERTa
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)
[](https://forthebadge.com)
[](https://forthebadge.com)
[](https://forthebadge.com)
## Introduction
The first public Vietnamese visual linguistic foundation model(s). This work was carried out only by myself under supervision of Dr Pham Quang Nhat Minh @ Aimesoft and Dr Tran Giang Son @ USTH. Thanks to Mr Nguyen Anh Duong @ VietAI for TPU supports.
Keywords: computer vision, natural language processing, visual linguistic, image text, pretrain, Vietnamese, foundation, multi-modal, machine learning
## Results
### On UIT-ViIC test set
| | BLEU 1 | BLEU 2 | BLEU 3 | BLEU 4 | RougeL |
|------------|--------|--------|--------|--------|--------|
| Baseline 1 | 0.7100 | 0.5750 | 0.4760 | 0.3940 | 0.6260 |
| Baseline 2 | 0.6820 | 0.5610 | 0.4110 | 0.3270 | 0.5990 |
| IC model | 0.8764 | 0.7943 | 0.7247 | 0.6685 | 0.6320 |Baseline models are the best models in [UIT-ViIC](https://link.springer.com/chapter/10.1007/978-3-030-63007-2_57) paper.
### On VQA test set
| | Acc | BLEU 1 | BLEU 2 | BLEU 3 | BLEU 4 | RougeL |
|:---------:|:------:|:------:|:------:|:------:|:------:|:------:|
| Baseline | 0.3496 | - | - | - | - | - |
| VQA model | 0.3449 | 0.4526 | 0.4082 | 0.3997 | 0.4173 | 0.4390 |Baseline model is the best model in [IC](https://aclanthology.org/2021.paclic-1.72/) paper.
## Citation
To cite this repos or the models' weights or the theory,
```
@software{dinhanhx_VisualRoBERTa_2022,
title = {{VisualRoBERTa}},
author = {dinhanhx},
year = 2022,
month = 9,
url = {https://github.com/dinhanhx/VisualRoBERTa}
}
```⚠ This entry will be updated when the white paper is published or released to the public.
## Setup Dependencies
- For TPU, you just can `pip install` [requirements.txt](requirements.txt)
- For GPU, besides reading [requirements.txt](requirements.txt), you gotta remove any command related to TPU, XLA, then follow original PyTorch docs.## Download Dataset
In training (`run`) files (such as `run_ptrain.py`), paths to data folders are hardcoded
⚠ `TranslateCOCO2017` also contains json files from UIT-ViIC.
Download links:
- [MS COCO](https://cocodataset.org/#download)
- [Translate COCO 2017](https://huggingface.co/datasets/dinhanhx/coco-2017-vi) this work
- [ViVQA](https://github.com/kh4nh12/ViVQA)
- [UIT-ViIC](https://nlp.uit.edu.vn/datasets/#h.p_Uj6Wqs5dCpc4)You are encouraged to read `src/data.py` to understand dataset structure and renamed paths to something suitable for your systems.
## Train models
It's quite simple, just simple go with
```bash
python -m exp.run_.py
```for example, `python run_pretrain.py` will pretrain the model.
You are encouraged to read these files to understand what they do before training.
- For TPU, just run it like normal
- For GPU, you gotta remove/modify anything related to TPU such as `xla`, `tpu`, `xm`, `xla_spawn_debug`, `DistributedSampler`...⚠ Hardcoded file paths might be updated.
Kill leftover processes
```bash
pgrep -f "python -m exp.run_pretrain" | xargs kill -9
```## Evaluate models
It's also simple, just simple go with
```bash
python -m exp.eval_.py
```for example, `python eval_vqa.py` will infer the models to produce the answers, **NOT** to compute metrics.
You are encouraged to read these files to understand what they do before evaluation.
⚠ Hardcoded file paths might be updated.