Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sanjayss34/codevqa

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/sanjayss34/codevqa
Owner: sanjayss34
License: bsd-3-clause
Created: 2023-05-22T06:32:26.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-07-16T23:41:49.000Z (over 1 year ago)
Last Synced: 2024-08-01T13:29:00.013Z (7 months ago)
Language: Python
Size: 50.1 MB
Stars: 84
Watchers: 2
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: CODEOWNERS
- Security: SECURITY.md

Awesome Lists containing this project

README

# Modular Visual Question Answering via Code Generation

This repo contains the code for the paper [Modular Visual Question Answering via Code Generation](https://arxiv.org/abs/2306.05392), published at ACL 2023.

# Setup
Follow these steps to set up an environment to run this code. First, create a fresh conda environment based on Python 3.8. Then
1. Run `pip install -e .` inside this repo.
2. Clone the Grounding DINO repo (https://github.com/IDEA-Research/GroundingDINO) and run `python -m pip install -e GroundingDINO` inside it to install it.
3. `pip install transformers==4.25 openai sentence-transformers`
Though the annotations for all 5 datasets used in our paper's evaluations are available online, we collected these annotations (and provided the dataset samples used in our evaluations when applicable) in a single zip file for your convenience: https://drive.google.com/file/d/1FrGEpgcGi9SjLPbQ-bGLlGZrdOAqA79j/view?usp=sharing .

# Experiments
Run these scripts to reproduce the results of CodeVQA and Few-shot PnP-VQA on the GQA, COVR, and NLVR2 test sets.
```
bash run_scripts/pnp-vqa/eval/gqa_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/covr_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/nlvr2_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/vqav2_eval_gpt3.sh
bash run_scripts/pnp-vqa/eval/okvqa_eval_gpt3.sh
```
The config files are stored at `lavis/projects/pnp-vqa/eval/{gqa/covr/nlvr2/vqav2/okvqa}_eval_gpt3{_codevqa}.yaml`. We provide a few commented-out options for (1) if you want to evaluate on the validation set (or sample thereof) instead of the test set, (2) randomly retrieving in-context examples instead of using question embeddings, and (3) using the `find_object` primitive for counting objects (for this, provided in the COVR and NLVR2 configs, use both the commented-out option for the `programs_path` and the commented-out `grounding_dino_path`).
Note: The preambles (API documentation) in the prompts for VQAv2 and OK-VQA may be suboptimal due to either misspecified functions (OK-VQA) or lack of function descriptions (VQAv2). The in-context examples, however, are valid.

# Acknowledgements
This repo is based on the original LAVIS repo: https://github.com/salesforce/LAVIS .

# Citation
If you find our paper or this repository useful in your work, please cite our paper:
```
@inproceedings{subramanian-etal-2023-modular,
title = "Modular Visual Question Answering via Code Generation",
author = "Subramanian, Sanjay and
Narasimhan, Medhini and
Khangaonkar, Kushal and
Yang, Kevin and
Nagrani, Arsha and
Schmid, Cordelia and
Zeng, Andy and
Darrell, Trevor and
Klein, Dan",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics",
month = july,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics"
}
```