https://github.com/markdtw/vqa-winner-cvprw-2017

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
https://github.com/markdtw/vqa-winner-cvprw-2017

pytorch visual-question-answering

Last synced: 2 months ago
JSON representation

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Host: GitHub
URL: https://github.com/markdtw/vqa-winner-cvprw-2017
Owner: markdtw
Created: 2017-09-08T14:55:00.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2019-02-08T00:04:45.000Z (over 6 years ago)
Last Synced: 2025-04-09T02:04:51.717Z (3 months ago)
Topics: pytorch, visual-question-answering
Language: Python
Homepage:
Size: 25.4 KB
Stars: 163
Watchers: 11
Forks: 38
Open Issues: 4
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # 2017 VQA Challenge Winner (CVPR'17 Workshop)

pytorch implementation of [Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al](https://arxiv.org/pdf/1708.02711.pdf).

![Model architecture](https://i.imgur.com/phBHIqZ.png)

## Prerequisites

- python 3.6+

- numpy

- [pytorch](http://pytorch.org/) 0.4

- [tqdm](https://pypi.python.org/pypi/tqdm)

- [nltk](http://www.nltk.org/install.html)

- [pandas](https://pandas.pydata.org/)

## Data

- [VQA 2.0](http://visualqa.org/download.html)

- [COCO 36 features pretrained resnet model](https://github.com/peteanderson80/bottom-up-attention#pretrained-features)

- [GloVe pretrained Wikipedia+Gigaword word embedding](https://nlp.stanford.edu/projects/glove/)

## Preparation

- To download and extract vqav2, glove, and pretrained visual features:

  ```bash

  bash scripts/download_extract.sh

  ```

- To prepare data for training:

  ```bash

  python scripts/preproc.py

  ```

- The structure of `data/` directory should look like this:

  ```

  - data/

    - zips/

      - v2_XXX...zip

      - ...

      - glove...zip

      - trainval_36.zip

    - glove/

      - glove...txt

      - ...

    - v2_XXX.json

    - ...

    - trainval_resnet...tsv

    (The above are files created after executing scripts/download_extract.sh)

    - tokenizers/

      - ...

    - dict_ans.pkl

    - dict_q.pkl

    - glove_pretrained_300.npy

    - train_qa.pkl

    - val_qa.pkl

    - train_vfeats.pkl

    - val_vfeats.pkl

    (The above are files created after executing scripts/preproc.py)

  ```

## Train

Use default parameters:

```bash

bash scripts/train.sh

```

## Notes

- Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6

- Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)

- After all the preprocessing, `data/` directory may be up to 38G+

- Some of `preproc.py` and `utils.py` are based on [this repo](https://github.com/hengyuan-hu/bottom-up-attention-vqa)

## Resources

- [The paper](https://arxiv.org/pdf/1708.02711.pdf).

- [Their CVPR Workshop slides](http://cs.adelaide.edu.au/~Damien/Research/VQA-Challenge-Slides-TeneyAnderson.pdf).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/markdtw/vqa-winner-cvprw-2017

Awesome Lists containing this project

README