Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sanjayss34/codevqa
https://github.com/sanjayss34/codevqa
Last synced: 8 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/sanjayss34/codevqa
- Owner: sanjayss34
- License: bsd-3-clause
- Created: 2023-05-22T06:32:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-16T23:41:49.000Z (over 1 year ago)
- Last Synced: 2024-08-01T13:29:00.013Z (3 months ago)
- Language: Python
- Size: 50.1 MB
- Stars: 84
- Watchers: 2
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
# Modular Visual Question Answering via Code Generation
This repo contains the code for the paper [Modular Visual Question Answering via Code Generation](https://arxiv.org/abs/2306.05392), published at ACL 2023.
# Setup
Follow these steps to set up an environment to run this code. First, create a fresh conda environment based on Python 3.8. Then
1. Run `pip install -e .` inside this repo.
2. Clone the Grounding DINO repo (https://github.com/IDEA-Research/GroundingDINO) and run `python -m pip install -e GroundingDINO` inside it to install it.
3. `pip install transformers==4.25 openai sentence-transformers`
Though the annotations for all 5 datasets used in our paper's evaluations are available online, we collected these annotations (and provided the dataset samples used in our evaluations when applicable) in a single zip file for your convenience: https://drive.google.com/file/d/1FrGEpgcGi9SjLPbQ-bGLlGZrdOAqA79j/view?usp=sharing .# Experiments
Run these scripts to reproduce the results of CodeVQA and Few-shot PnP-VQA on the GQA, COVR, and NLVR2 test sets.
```
bash run_scripts/pnp-vqa/eval/gqa_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/covr_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/nlvr2_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/vqav2_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/okvqa_eval_gpt3.sh
```
The config files are stored at `lavis/projects/pnp-vqa/eval/{gqa/covr/nlvr2/vqav2/okvqa}_eval_gpt3{_codevqa}.yaml`. We provide a few commented-out options for (1) if you want to evaluate on the validation set (or sample thereof) instead of the test set, (2) randomly retrieving in-context examples instead of using question embeddings, and (3) using the `find_object` primitive for counting objects (for this, provided in the COVR and NLVR2 configs, use both the commented-out option for the `programs_path` and the commented-out `grounding_dino_path`).
Note: The preambles (API documentation) in the prompts for VQAv2 and OK-VQA may be suboptimal due to either misspecified functions (OK-VQA) or lack of function descriptions (VQAv2). The in-context examples, however, are valid.# Acknowledgements
This repo is based on the original LAVIS repo: https://github.com/salesforce/LAVIS .# Citation
If you find our paper or this repository useful in your work, please cite our paper:
```
@inproceedings{subramanian-etal-2023-modular,
title = "Modular Visual Question Answering via Code Generation",
author = "Subramanian, Sanjay and
Narasimhan, Medhini and
Khangaonkar, Kushal and
Yang, Kevin and
Nagrani, Arsha and
Schmid, Cordelia and
Zeng, Andy and
Darrell, Trevor and
Klein, Dan",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics",
month = july,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics"
}
```