https://github.com/adapter-hub/xgqa

Last synced: 18 days ago
JSON representation

Host: GitHub
URL: https://github.com/adapter-hub/xgqa
Owner: adapter-hub
Created: 2021-09-13T10:37:48.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2022-03-04T21:33:18.000Z (about 3 years ago)
Last Synced: 2025-04-14T23:12:48.290Z (18 days ago)
Size: 8.77 MB
Stars: 26
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # xGQA

This reporistory contains the test-dev data of the paper ["xGQA: Cross-lingual Visual Question Answering"](https://arxiv.org/abs/2109.06082).

xGQA builds on the original work of [Hudson et al. 2019: GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering](https://arxiv.org/pdf/1902.09506.pdf). The training data can be downloaded [here](https://cs.stanford.edu/people/dorarad/gqa/).

## Overview

The repository is structured as follows:

- **`data/zero_shot/`** contains the xGQA test-dev files for all 8 languages

- **`data/few_shot/`** contains the new standard splits for few shot learning. The number in the file name indicates how many distinct images the split includes. i.e. `train_10.json` implies that this subset contains questions about 10 distinct images.

## Training Data

Please download the English training data of GQA ([Hudson et al. 2019)](https://arxiv.org/pdf/1902.09506.pdf) [here](https://cs.stanford.edu/people/dorarad/gqa/).

## Zero-Shot Results

Zero-shot transfer results on xGQA when transferring from English GQA. Average accuracy is reported. Mean scores are not averaged over

the source language (English).

| model     | en    | de    | pt    | ru    | id    | bn    | ko    | zh    | mean |

|-----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|

| M3P       | 58.43 | 23.93 | 24.37 | 20.37 | 22.57 | 15.83 | 16.90 | 18.60 | 20.37|

| OSCAR+Emb | 62.23 | 17.35 | 19.25 | 10.52 | 18.26 | 14.93 | 17.10 | 16.41 | 16.26|

| OSCAR+Ada | 60.30 | 18.91 | 27.02 | 17.50 | 18.77 | 15.42 | 15.28 | 14.96 | 18.27|

| mBERTAda  | 56.25 | 29.76 | 30.37 | 24.42 | 19.15 | 15.12 | 19.09 | 24.86 | 23.25|

## Few-Shot

Few-shot dataset sizes. The GQA test-dev set is split into new development, test sets, and training splits of different sizes. We maintain the distribution of structural types in each split.

| Set        | Test |  Dev | Train |     |     |     |     |      |

|------------|:----:|:----:|:-----:|-----|-----|-----|-----|------|

| #Images    |  300 |   50 |     1 |  5  |  10 |  20 |  25 |   48 |

| #Questions | 9666 | 1422 |    27 | 155 | 317 | 594 | 704 | 1490 |

## Citation

If you find this repository helpful, please cite our paper ["xGQA: Cross-lingual Visual Question Answering"](https://arxiv.org/):

```bibtex

@inproceedings{pfeiffer-etal-2021-xGQA,

    title={{xGQA: Cross-Lingual Visual Question Answering}},

    author={ Jonas Pfeiffer and Gregor Geigle and Aishwarya Kamath and Jan-Martin O. Steitz and Stefan Roth and Ivan Vuli{\'{c}} and Iryna Gurevych},

    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",

    month = May,

    year = "2022",  

    url = "https://arxiv.org/pdf/2109.06082.pdf",

    publisher = "Association for Computational Linguistics",

}

```

Shield: [![CC BY 4.0][cc-by-shield]][cc-by]

This work is licensed under a

[Creative Commons Attribution 4.0 International License][cc-by].

[![CC BY 4.0][cc-by-image]][cc-by]

[cc-by]: http://creativecommons.org/licenses/by/4.0/

[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png

[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adapter-hub/xgqa

Awesome Lists containing this project

README