Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
https://github.com/jokieleung/awesome-visual-question-answering
Last synced: 4 days ago
JSON representation
-
Papers
-
Survey
- Visual question answering: Datasets, algorithms, and future challenges - Kushal Kafle et al, **CVIU 2017**.
- Visual question answering: A survey of methods and datasets - Qi Wu et al, **CVIU 2017**.
-
2022
- DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering - Le Qi et al, **ACL 2022 (Findings)**.
- Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering - Jialin Wu et al, **EMNLP 2022**.
- Retrieval Augmented Visual Question Answering with Outside Knowledge - Weizhe Lin et al, **EMNLP 2022**.
- CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering - Maitreya Patel et al, **EMNLP 2022**. [[proj]](https://maitreyapatel.com/CRIPP-VQA/) [[code]](https://github.com/Maitreyapatel/CRIPP-VQA/)
- Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning - Qingyi Si et al, **EMNLP 2022 (Findings)**. [[code]](https://github.com/PhoebusSi/MMBS)
- Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training - Anthony Meng Huat Tiong et al, **EMNLP 2022 (Findings)**.
- Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA - Qingyi Si et al, **EMNLP 2022 (Findings)**. [[code]](https://github.com/PhoebusSi/VQA-VS)
- REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering - Yuanze Lin et al, **NeurIPS 2022**.
- Towards Video Text Visual Question Answering: Benchmark and Baseline - Minyi Zhao et al, **NeurIPS 2022**.
- CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment - Haoyu Song et al, **ACL 2022**.
- CARETS: A Consistency And Robustness Evaluative Test Suite for VQA - Carlos Jimenez et al, **ACL 2022**.
- Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering - Yu-Jung Heo et al, **ACL 2022**.
- xGQA: Cross-Lingual Visual Question Answering - Jonas Pfeiffer et al, **ACL 2022 (Findings)**. [[data]](https://github.com/Adapter-Hub/xGQA)
- Co-VQA : Answering by Interactive Sub Question Sequence - Ruonan Wang et al, **ACL 2022 (Findings)**.
- SimVQA: Exploring Simulated Environments for Visual Question Answering - Paola Cascante-Bonilla et al, **CVPR 2022**. [[code]](https://www.cs.rice.edu/~pc51/simvqa/)
- A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering - Feng Gao et al, **CVPR 2022**.
- SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question Answering - Vipul Gupta et al, **CVPR 2022**. [[code]](https://github.com/vipulgupta1011/swapmix)
- Dual-Key Multimodal Backdoors for Visual Question Answering - Matthew Walmer et al, **CVPR 2022**. [[code]](https://github.com/SRI-CSL/TrinityMultimodalTrojAI)
- MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering - Yang Ding et al, **CVPR 2022**. [[code]](https://github.com/AndersonStra/MuKEA)
- Grounding Answers for Visual Questions Asked by Visually Impaired People - Choyan Chen et al, **CVPR 2022**. [[page]](https://vizwiz.org/tasks-and-datasets/answer-grounding-for-vqa/)
- Maintaining Reasoning Consistency in Compositional Visual Question Answering - Chenchen Jing et al, **CVPR 2022**. [[code]](https://github.com/jingchenchen/ReasoningConsistency-VQA)
- RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning - Xiaojian Ma et al, **ICLR 2022**. [[code]](https://github.com/NVlabs/RelViT)
- Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering - Mingxiao Li et al, **AAAI 2022**. [[code]](https://github.com/Mingxiao-Li/DMMGR)
- Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering - Min Peng et al, **IJCAI 2022**. [[code]](https://github.com/Mvrjustid/MHN-IJCAI22)
- TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation - Jun Wang et al, **BMVC 2022**. [[code]](https://github.com/HenryJunW/TAG)
- Video Question Answering: Datasets, Algorithms and Challenges - yaoyao Zhong et al, **EMNLP 2022**.
-
2021
- TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption - Zhengyuan Yang et al, **CVPR 2021**
- Counterfactual VQA: A Cause-Effect Look at Language Bias - Yulei Niu et al, **CVPR 2021** [[code]](https://github.com/yuleiniu/cfvqa)
- KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA - Kenneth Marino et al, **CVPR 2021**
- Human-Adversarial Visual Question Answering - Sasha Sheng et al, **NeurIPS 2021**. [[code]](https://adversarialvqa.org/)
- Debiased Visual Question Answering from Feature and Sample Perspectives - Zhiquan Wen et al, **NeurIPS 2021**. [[code]](https://github.com/Zhiquan-Wen/D-VQA)
- Learning to Generate Visual Questions with Noisy Supervision - Kai Shen et al, **NeurIPS 2021**. [[code]](https://github.com/AlanSwift/DH-GAN)
- Proto: Program-guided transformer for program-guided tasks - Zelin Zhao et al, **NeurIPS 2021**. [[code]](https://github.com/sjtuytc/Neurips21-ProTo-Program-guided-Transformers-for-Program-guided-Tasks)
- Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering - Jihyung Kil et al, **EMNLP 2021**.
- Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking - Jihyung Kil et al, **EMNLP 2021 (demo)**. [[code]](https://github.com/patilli/vqa_benchmarking)
- Diversity and Consistency: Exploring Visual Question-Answer Pair Generation - Sen Yang et al, **EMNLP 2021 (Findings)**.
- Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation - Humair Raj Khan et al, **EMNLP 2021 (Findings)**.
- MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering - Junjie Wang et al, **EMNLP 2021 (Findings)**. [[code]](https://github.com/iigroup/mirtt)
- Just Ask: Learning To Answer Questions From Millions of Narrated Videos - Antoine Yang et al, **ICCV 2021**.
- Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments - Difei Gao et al, **ICCV 2021**.
- On The Hidden Treasure of Dialog in Video Question Answering - Deniz Engin et al, **ICCV 2021**.
- Unshuffling Data for Improved Generalization in Visual Question Answering - Damien Teney et al, **ICCV 2021**.
- TRAR: Routing the Attention Spans in Transformer for Visual Question Answering - Yiyi Zhou et al, **ICCV 2021**.
- Greedy Gradient Ensemble for Robust Visual Question Answering - Xinzhe Han et al, **ICCV 2021**.
- Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos - Heeseung Yun et al, **ICCV 2021**.
- Weakly Supervised Relative Spatial Reasoning for Visual Question Answering - Pratyay Banerjee et al, **ICCV 2021**.
- Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering - Qingxing Cao et al, **ICCV 2021**.
- Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering - Corentin Dancette et al, **ICCV 2021**.
- Auto-Parsing Network for Image Captioning and Visual Question Answering - Xu Yang et al, **ICCV 2021**.
- Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue - Shoya Matsumori et al, **ICCV 2021**.
- Check It Again:Progressive Visual Question Answering via Visual Entailment - Qingyi Si et al, **ACL 2021**. [[code]](https://github.com/PhoebusSi/SAR)
- Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering - Siddharth Karamcheti et al, **ACL 2021**. [[code]](https://github.com/siddk/vqa-outliers)
- In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering - Peter Vickers et al, **ACL 2021**.
- Towards Visual Question Answering on Pathology Images - Xuehai He et al, **ACL 2021**. [[code]](https://github.com/UCSD-AI4H/PathVQA)
- Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions - Daniel Rosenberg et al, **ACL 2021**. [[code]](https://danrosenberg.github.io/rad-measure/)
- LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering - Zujie Liang et al, **SIGIR 2021**. [[code]](https://github.com/jokieleung/LPF-VQA)
- Passage Retrieval for Outside-Knowledge Visual Question Answering - Chen Qu et al, **SIGIR 2021**. [[code]](https://github.com/prdwb/okvqa-release)
- Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering - Aman Jain et al, **SIGIR 2021**. [[code]](https://s3vqa.github.io/)
- Visual Question Rewriting for Increasing Response Rate - Jiayi Wei et al, **SIGIR 2021**.
- Separating Skills and Concepts for Novel Visual Question Answering - Spencer Whitehead et al, **CVPR 2021**.
- Roses Are Red, Violets Are Blue... but Should VQA Expect Them To? - Corentin Kervadec et al, **CVPR 2021** [[code]](https://github.com/gqa-ood/GQA-OOD)
- Predicting Human Scanpaths in Visual Question Answering - Xianyu Chen et al, **CVPR 2021**
- Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules - Aisha Urooj et al, **CVPR 2021**
- Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing - Yuanyuan Yuan et al, **CVPR 2021**
- How Transferable Are Reasoning Patterns in VQA? - Corentin Kervadec et al, **CVPR 2021**
- Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels - Mingda Zhang et al, **CVPR 2021**
- Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation - Tao Tu et al, **CVPR 2021**
- MultiModalQA: complex question answering over text, tables and images - Alon Talmor et al, **ICLR 2021**. [[page]](https://allenai.github.io/multimodalqa/)
- CLEVR_HYP: A Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images - Shailaja Keyur Sampat et al, **NAACL-HLT 2021**. [[code]](https://github.com/shailaja183/clevr_hyp)
- Video Question Answering with Phrases via Semantic Roles - Arka Sadhu et al, **NAACL-HLT 2021**.
- SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency - Sameer Dharur et al, **NAACL-HLT 2021**.
- Ensemble of MRR and NDCG models for Visual Dialog - Idan Schwartz, **NAACL-HLT 2021**. [[code]](https://github.com/idansc/mrr-ndcg)
- Regularizing Attention Networks for Anomaly Detection in Visual Question Answering - Doyup Lee et al, **AAAI 2021**.
- A Case Study of the Shortcut Effects in Visual Commonsense Reasoning - Keren Ye et al, **AAAI 2021**. [[code]](https://github.com/yekeren/VCR-shortcut-effects-study)
- VisualMRC: Machine Reading Comprehension on Document Images - Ryota Tanaka et al, **AAAI 2021**. [[page]](https://github.com/nttmdlab-nlp/VisualMRC)
-
2020
- MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering - Tejas Gokhale et al, **EMNLP 2020**. [[code]](https://github.com/tejas-gokhale/vqa_mutant)
- Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering - Zujie Liang et al, **EMNLP 2020**. [[code]](https://github.com/jokieleung/CL-VQA)
- VD-BERT: A Unified Vision and Dialog Transformer with BERT - Yue Wang et al, **EMNLP 2020**.
- Multimodal Graph Networks for Compositional Generalization in Visual Question Answering - Raeid Saqur et al, **NeurIPS 2020**.
- Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies - Itai Gat et al, **NeurIPS 2020**.
- Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data - Michael Cogswell et al, **NeurIPS 2020**.
- On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law - Damien Teney et al, **NeurIPS 2020**.
- Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder - Gouthaman KV et al, **ECCV 2020**.
- Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions - Noa Garcia et al, **ECCV 2020**.
- Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering - Ruixue Tang et al, **ECCV 2020**.
- Visual Question Answering on Image Sets - Ankan Bansal et al, **ECCV 2020**.
- VQA-LOL: Visual Question Answering under the Lens of Logic - Tejas Gokhale et al, **ECCV 2020**.
- TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering - Xiaofeng Yang et al, **ECCV 2020**.
- Spatially Aware Multimodal Transformers for TextVQA - Yash Kant et al, **ECCV 2020**.
- On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering - Xinyu Wang et al, **CVPR 2020**.
- In Defense of Grid Features for Visual Question Answering - Huaizu Jiang et al, **CVPR 2020**.
- Counterfactual Samples Synthesizing for Robust Visual Question Answering - Long Chen et al, **CVPR 2020**.
- Counterfactual Vision and Language Learning - Ehsan Abbasnejad et al, **CVPR 2020**.
- Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA - Ronghang Hu et al, **CVPR 2020**.
- Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing - Vedika Agarwal et al, **CVPR 2020**.
- SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions - Ramprasaath R. Selvaraju et al, **CVPR 2020**.
- TA-Student VQA: Multi-Agents Training by Self-Questioning - Peixi Xiong et al, **CVPR 2020**.
- VQA With No Questions-Answers Training - Ben-Zion Vatashsky et al, **CVPR 2020**.
- Hierarchical Conditional Relation Networks for Video Question Answering - Thao Minh Le et al, **CVPR 2020**.
- Modality Shifting Attention Network for Multi-Modal Video Question Answering - Junyeong Kim et al, **CVPR 2020**.
- Webly Supervised Knowledge Embedding Model for Visual Reasoning - Wenbo Zheng et al, **CVPR 2020**.
- Differentiable Adaptive Computation Time for Visual Reasoning - Cristobal Eyzaguirre et al, **CVPR 2020**.
- A negative case analysis of visual grounding methods for VQA - Robik Shrestha et al, **ACL 2020**.
- Cross-Modality Relevance for Reasoning on Language and Vision - Chen Zheng et al, **ACL 2020**.
- Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA - Hyounghun Kim et al, **ACL 2020**.
- TVQA+: Spatio-Temporal Grounding for Video Question Answering - Jie Lei et al, **ACL 2020**.
- BERT representations for Video Question Answering - Zekun Yang et al, **WACV 2020**.
- Deep Bayesian Network for Visual Question Generation - Badri Patro et al, **WACV 2020**.
- Robust Explanations for Visual Question Answering - Badri Patro et al, **WACV 2020**.
- Visual Question Answering on 360deg Images - Shih-Han Chou et al, **WACV 2020**.
- LEAF-QA: Locate, Encode & Attend for Figure Question Answering - Ritwick Chaudhry et al, **WACV 2020**.
- Answering Questions about Data Visualizations using Efficient Bimodal Fusion - Kushal Kafle et al, **WACV 2020**.
- Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text - Difei Gao et al, **CVPR 2020**. [[code]]( https://github.com/ricolike/mmgnn_textvqa)
-
2019
- Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering - Claudio Greco et al, **ACL 2019**. [[code]]( https://www.aclweb.org/anthology/P19-1350/ )
- Multi-grained Attention with Object-level Grounding for Visual Question Answering - Pingping Huang et al, **ACL 2019**.
- Compact Trilinear Interaction for Visual Question Answering - Tuong Do Kim et al, **ICCV 2019**.
- Scene Text Visual Question Answering - Ali Furkan Biten et al, **ICCV 2019**.
- Multi-Modality Latent Interaction Network for Visual Question Answering - Peng Gao et al, **ICCV 2019**.
- Relation-Aware Graph Attention Network for Visual Question Answering - Linjie Li et al, **ICCV 2019**.
- Why Does a Visual Question Have Different Answers? - Nilavra Bhattacharya et al, **ICCV 2019**.
- RUBi: Reducing Unimodal Biases for Visual Question Answering - Remi Cadene et al, **NeurIPS 2019**.
- Self-Critical Reasoning for Robust Visual Question Answering - Jialin Wu et al, **NeurIPS 2019**.
- Deep Modular Co-Attention Networks for Visual Question Answering - Zhou Yu et al, **CVPR 2019**. [[code]](https://github.com/MILVLG/mcan-vqa)
- Information Maximizing Visual Question Generation - Ranjay Krishna et al, **CVPR 2019**. [code]
- Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence - Amir Zadeh et al, **CVPR 2019**. [code]
- Learning to Compose Dynamic Tree Structures for Visual Contexts - Kaihua Tang et al, **CVPR 2019**. [code]
- Transfer Learning via Unsupervised Task Discovery for Visual Question Answering - Hyeonwoo Noh et al, **CVPR 2019**. [code]
- Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph - Yao-Hung Hubert Tsai et al, **CVPR 2019**. [[code]](https://github.com/yaohungt/Gated-Spatio-Temporal-Energy-Graph)
- Explainable and Explicit Visual Reasoning over Scene Graphs - Jiaxin Shi et al, **CVPR 2019**. [[code]](https://github.com/shijx12/XNM-Net)
- MUREL: Multimodal Relational Reasoning for Visual Question Answering - Remi Cadene et al, **CVPR 2019**. [[code]](https://github.com/Cadene/murel.bootstrap.pytorch)
- Image-Question-Answer Synergistic Network for Visual Dialog - Dalu Guo et al, **CVPR 2019**. [code]
- RAVEN: A Dataset for Relational and Analogical Visual rEasoNing - Chi Zhang et al, **CVPR 2019**. [[project page]](http://wellyzhang.github.io/project/raven.html)
- Cycle-Consistency for Robust Visual Question Answering - Meet Shah et al, **CVPR 2019**.
- It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning - Monica Haurilet et al, **CVPR 2019**.
- OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge - Kenneth Marino et al, **CVPR 2019**.
- Visual Question Answering as Reading Comprehension - Hui Li et al, **CVPR 2019**.
- Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering - Peng Gao et al, **CVPR 2019**.
- Explicit Bias Discovery in Visual Question Answering Models - Varun Manjunatha et al, **CVPR 2019**.
- Answer Them All! Toward Universal Visual Question Answering Models - Robik Shrestha et al, **CVPR 2019**.
- Visual Query Answering by Entity-Attribute Graph Matching and Reasoning - Peixi Xiong et al, **CVPR 2019**.
- Differential Networks for Visual Question Answering - Chenfei Wu et al, **AAAI 2019**. [code]
- BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection - Hedi Ben-younes et al, **AAAI 2019**. [[code]](https://github.com/Cadene/block.bootstrap.pytorch)
- Dynamic Capsule Attention for Visual Question Answering - Yiyi Zhou et al, **AAAI 2019**. [[code]](https://github.com/XMUVQA/CapsAtt)
- Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering - Xiangpeng Li et al, **AAAI 2019**. [[code]](https://github.com/lixiangpengcs/PSAC)
- Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning - Yiyi Zhou et al, **AAAI 2019**. [[code]](https://github.com/xiangmingLi/PIL)
- Focal Visual-Text Attention for Memex Question Answering - Junwei Liang et al, **TPAMI 2019**. [[code]](https://memexqa.cs.cmu.edu/)
- Combining Multiple Cues for Visual Madlibs Question Answering - Tatiana Tommasi et al, **IJCV 2019**. [code]
- Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation - Sang-Woo Lee et al, **ICLR 2019**. [[code]](https://github.com/naver/aqm-plus)
- Improving Visual Question Answering by Referring to Generated Paragraph Captions - Hyounghun Kim et al, **ACL 2019**.
-
2018
- Bilinear Attention Networks - Jin-Hwa Kim et al, **NIPS 2018**. [code]
- Chain of Reasoning for Visual Question Answering - Chenfei Wu et al, **NIPS 2018**. [code]
- Learning Conditioned Graph Structures for Interpretable Visual Question Answering - Will Norcliffe-Brown et al, **NIPS 2018**. [[code]](https://github.com/aimbrain/vqa-project)
- Learning to Specialize with Knowledge Distillation for Visual Question Answering - Jonghwan Mun et al, **NIPS 2018**. [code]
- Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering - Medhini Narasimhan et al, **NIPS 2018**. [code]
- Overcoming Language Priors in Visual Question Answering with Adversarial Regularization - Sainandan Ramakrishnan et al, **NIPS 2018**. [code]
- Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering - Somak Aditya et al, **AAAI 2018**. [code]
- Co-Attending Free-Form Regions and Detections with Multi-Modal Multiplicative Feature Embedding for Visual Question Answering - Pan Lu et al, **AAAI 2018**. [[code]](https://github.com/lupantech/dual-mfa-vqa)
- Exploring Human-Like Attention Supervision in Visual Question Answering - Somak Aditya et al, **AAAI 2018**. [code]
- Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents - Bo Wang et al, **AAAI 2018**. [code]
- Feature Enhancement in Attention for Visual Question Answering - Yuetan Lin et al, **IJCAI 2018**. [code]
- A Question Type Driven Framework to Diversify Visual Question Generation - Zhihao Fan et al, **IJCAI 2018**. [code]
- Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network - Zhou Zhao et al, **IJCAI 2018**. [code]
- Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks - Zhou Zhao et al, **IJCAI 2018**. [code]
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - Peter Anderson et al, **CVPR 2018**. [[code(author)]](https://github.com/peteanderson80/bottom-up-attention) [[code(pythiaV0.1)]](https://github.com/facebookresearch/pythia) [[code(Pytorch Reimplementation)]](https://github.com/hengyuan-hu/bottom-up-attention-vqa)
- Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge - Damien Teney et al, **CVPR 2018**. [code]
- Learning by Asking Questions - Ishan Misra et al, **CVPR 2018**. [code]
- Embodied Question Answering - Abhishek Das et al, **CVPR 2018**. [code]
- VizWiz Grand Challenge: Answering Visual Questions From Blind People - Danna Gurari et al, **CVPR 2018**. [code]
- Textbook Question Answering Under Instructor Guidance With Memory Networks - Juzheng Li et al, **CVPR 2018**. [[code]](https://github.com/freerailway/igmn)
- IQA: Visual Question Answering in Interactive Environments - Daniel Gordon et al, **CVPR 2018**. [[sample video]](https://youtu.be/pXd3C-1jr98)
- Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering - Aishwarya Agrawal et al, **CVPR 2018**. [code]
- Learning Answer Embeddings for Visual Question Answering - Hexiang Hu et al, **CVPR 2018**. [code]
- DVQA: Understanding Data Visualizations via Question Answering - Kushal Kafle et al, **CVPR 2018**. [code]
- Cross-Dataset Adaptation for Visual Question Answering - Wei-Lun Chao et al, **CVPR 2018**. [code]
- Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering - Unnat Jain et al, **CVPR 2018**. [code]
- Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering - Duy-Kien Nguyen et al, **CVPR 2018**. [code]
- Visual Question Generation as Dual Task of Visual Question Answering - Yikang Li et al, **CVPR 2018**. [code]
- Focal Visual-Text Attention for Visual Question Answering - Junwei Liang et al, **CVPR 2018**. [code]
- Motion-Appearance Co-Memory Networks for Video Question Answering - Jiyang Gao et al, **CVPR 2018**. [code]
- Visual Question Answering With Memory-Augmented Networks - Chao Ma et al, **CVPR 2018**. [code]
- Visual Question Reasoning on General Dependency Tree - Qingxing Cao et al, **CVPR 2018**. [code]
- Differential Attention for Visual Question Answering - Badri Patro et al, **CVPR 2018**. [code]
- Learning Visual Knowledge Memory Networks for Visual Question Answering - Zhou Su et al, **CVPR 2018**. [code]
- IVQA: Inverse Visual Question Answering - Feng Liu et al, **CVPR 2018**. [code]
- Customized Image Narrative Generation via Interactive Visual Question Generation and Answering - Andrew Shin et al, **CVPR 2018**. [code]
- Visual Question Answering as a Meta Learning Task - Damien Teney et al, **ECCV 2018**. [code]
- Question-Guided Hybrid Convolution for Visual Question Answering - Peng Gao et al, **ECCV 2018**. [code]
- Goal-Oriented Visual Question Generation via Intermediate Rewards - Junjie Zhang et al, **ECCV 2018**. [code]
- Multimodal Dual Attention Memory for Video Story Question Answering - Kyung-Min Kim et al, **ECCV 2018**. [code]
- A Joint Sequence Fusion Model for Video Question Answering and Retrieval - Youngjae Yu et al, **ECCV 2018**. [code]
- Deep Attention Neural Tensor Network for Visual Question Answering - Yalong Bai et al, **ECCV 2018**. [code]
- Question Type Guided Attention in Visual Question Answering - Yang Shi et al, **ECCV 2018**. [code]
- Learning Visual Question Answering by Bootstrapping Hard Attention - Mateusz Malinowski et al, **ECCV 2018**. [code]
- Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering - Medhini Narasimhan et al, **ECCV 2018**. [code]
- Visual Question Generation for Class Acquisition of Unknown Objects - Kohei Uehara et al, **ECCV 2018**. [[code]](https://github.com/mil-tokyo/vqg-unknown)
- Image Captioning and Visual Question Answering Based on Attributes and External Knowledge - Qi Wu et al, **TPAMI 2018**. [code]
- FVQA: Fact-Based Visual Question Answering - Peng Wang et al, **TPAMI 2018**. [code]
- Interpretable Counting for Visual Question Answering - Alexander Trott et al, **ICLR 2018**. [code]
- Learning to Count Objects in Natural Images for Visual Question Answering - Yan Zhang et al, **ICLR 2018**. [code]
- A Better Way to Attend: Attention With Trees for Video Question Answering - Hongyang Xue et al, **TIP 2018**. [[code]](https://github.com/xuehy/TreeAttention)
- Zero-Shot Transfer VQA Dataset - Pan Lu et al, **arxiv preprint**. [code]
- Visual Question Answering using Explicit Visual Attention - Vasileios Lioutas et al, ***\*ISCAS 2018\****. [code]
- Explicit ensemble attention learning for improving visual question answering - Vasileios Lioutas et al, ***\*Pattern Recognition Letters 2018\****. [code]
-
2017-2015
- Learning to Reason: End-to-End Module Networks for Visual Question Answering - Ronghang Hu et al, **ICCV 2017**. [code]
- Structured Attentions for Visual Question Answering - Chen Zhu et al, **ICCV 2017**. [[code]](https://github.com/shtechair/vqa-sva)
- VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation - Chuang Gan et al, **ICCV 2017**. [[code]](https://github.com/Cold-Winter/vqs)
- **Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering** - Zhou Yu et al, **ICCV 2017**. [[code]](https://github.com/yuzcccc/vqa-mfb)
- An Analysis of Visual Question Answering Algorithms - Kushal Kafle et al, **ICCV 2017**. [code]
- MUTAN: Multimodal Tucker Fusion for Visual Question Answering - Hedi Ben-younes et al, **ICCV 2017**. [[code]](https://github.com/cadene/vqa.pytorch)
- MarioQA: Answering Questions by Watching Gameplay Videos - Jonghwan Mun et al, **ICCV 2017**. [code]
- Learning to Disambiguate by Asking Discriminative Questions - Yining Li et al, **ICCV 2017**. [code]
-
-
VQA Challenge Leaderboard
-
test-std 2018
-
test-std 2017
-
[TextVQA]( https://textvqa.org/ )
-
-
Licenses
-
VQA-CP
-