Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
https://github.com/jokieleung/awesome-visual-question-answering

Last synced: 4 days ago
JSON representation

Papers
- Survey
  - Visual question answering: Datasets, algorithms, and future challenges - Kushal Kafle et al, **CVIU 2017**.
  - Visual question answering: A survey of methods and datasets - Qi Wu et al, **CVIU 2017**.
- 2022
  - DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering - Le Qi et al, **ACL 2022 (Findings)**.
  - Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering - Jialin Wu et al, **EMNLP 2022**.
  - Retrieval Augmented Visual Question Answering with Outside Knowledge - Weizhe Lin et al, **EMNLP 2022**.
  - CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering - Maitreya Patel et al, **EMNLP 2022**. [[proj]](https://maitreyapatel.com/CRIPP-VQA/) [[code]](https://github.com/Maitreyapatel/CRIPP-VQA/)
  - Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning - Qingyi Si et al, **EMNLP 2022 (Findings)**. [[code]](https://github.com/PhoebusSi/MMBS)
  - Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training - Anthony Meng Huat Tiong et al, **EMNLP 2022 (Findings)**.
  - Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA - Qingyi Si et al, **EMNLP 2022 (Findings)**. [[code]](https://github.com/PhoebusSi/VQA-VS)
  - REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering - Yuanze Lin et al, **NeurIPS 2022**.
  - Towards Video Text Visual Question Answering: Benchmark and Baseline - Minyi Zhao et al, **NeurIPS 2022**.
  - CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment - Haoyu Song et al, **ACL 2022**.
  - CARETS: A Consistency And Robustness Evaluative Test Suite for VQA - Carlos Jimenez et al, **ACL 2022**.
  - Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering - Yu-Jung Heo et al, **ACL 2022**.
  - xGQA: Cross-Lingual Visual Question Answering - Jonas Pfeiffer et al, **ACL 2022 (Findings)**. [[data]](https://github.com/Adapter-Hub/xGQA)
  - Co-VQA : Answering by Interactive Sub Question Sequence - Ruonan Wang et al, **ACL 2022 (Findings)**.
  - SimVQA: Exploring Simulated Environments for Visual Question Answering - Paola Cascante-Bonilla et al, **CVPR 2022**. [[code]](https://www.cs.rice.edu/~pc51/simvqa/)
  - A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering - Feng Gao et al, **CVPR 2022**.
  - SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question Answering - Vipul Gupta et al, **CVPR 2022**. [[code]](https://github.com/vipulgupta1011/swapmix)
  - Dual-Key Multimodal Backdoors for Visual Question Answering - Matthew Walmer et al, **CVPR 2022**. [[code]](https://github.com/SRI-CSL/TrinityMultimodalTrojAI)
  - MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering - Yang Ding et al, **CVPR 2022**. [[code]](https://github.com/AndersonStra/MuKEA)
  - Grounding Answers for Visual Questions Asked by Visually Impaired People - Choyan Chen et al, **CVPR 2022**. [[page]](https://vizwiz.org/tasks-and-datasets/answer-grounding-for-vqa/)
  - Maintaining Reasoning Consistency in Compositional Visual Question Answering - Chenchen Jing et al, **CVPR 2022**. [[code]](https://github.com/jingchenchen/ReasoningConsistency-VQA)
  - RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning - Xiaojian Ma et al, **ICLR 2022**. [[code]](https://github.com/NVlabs/RelViT)
  - Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering - Mingxiao Li et al, **AAAI 2022**. [[code]](https://github.com/Mingxiao-Li/DMMGR)
  - Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering - Min Peng et al, **IJCAI 2022**. [[code]](https://github.com/Mvrjustid/MHN-IJCAI22)
  - TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation - Jun Wang et al, **BMVC 2022**. [[code]](https://github.com/HenryJunW/TAG)
  - Video Question Answering: Datasets, Algorithms and Challenges - yaoyao Zhong et al, **EMNLP 2022**.
- 2021
  - TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption - Zhengyuan Yang et al, **CVPR 2021**
  - Counterfactual VQA: A Cause-Effect Look at Language Bias - Yulei Niu et al, **CVPR 2021** [[code]](https://github.com/yuleiniu/cfvqa)
  - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA - Kenneth Marino et al, **CVPR 2021**
  - Human-Adversarial Visual Question Answering - Sasha Sheng et al, **NeurIPS 2021**. [[code]](https://adversarialvqa.org/)
  - Debiased Visual Question Answering from Feature and Sample Perspectives - Zhiquan Wen et al, **NeurIPS 2021**. [[code]](https://github.com/Zhiquan-Wen/D-VQA)
  - Learning to Generate Visual Questions with Noisy Supervision - Kai Shen et al, **NeurIPS 2021**. [[code]](https://github.com/AlanSwift/DH-GAN)
  - Proto: Program-guided transformer for program-guided tasks - Zelin Zhao et al, **NeurIPS 2021**. [[code]](https://github.com/sjtuytc/Neurips21-ProTo-Program-guided-Transformers-for-Program-guided-Tasks)
  - Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering - Jihyung Kil et al, **EMNLP 2021**.
  - Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking - Jihyung Kil et al, **EMNLP 2021 (demo)**. [[code]](https://github.com/patilli/vqa_benchmarking)
  - Diversity and Consistency: Exploring Visual Question-Answer Pair Generation - Sen Yang et al, **EMNLP 2021 (Findings)**.
  - Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation - Humair Raj Khan et al, **EMNLP 2021 (Findings)**.
  - MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering - Junjie Wang et al, **EMNLP 2021 (Findings)**. [[code]](https://github.com/iigroup/mirtt)
  - Just Ask: Learning To Answer Questions From Millions of Narrated Videos - Antoine Yang et al, **ICCV 2021**.
  - Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments - Difei Gao et al, **ICCV 2021**.
  - On The Hidden Treasure of Dialog in Video Question Answering - Deniz Engin et al, **ICCV 2021**.
  - Unshuffling Data for Improved Generalization in Visual Question Answering - Damien Teney et al, **ICCV 2021**.
  - TRAR: Routing the Attention Spans in Transformer for Visual Question Answering - Yiyi Zhou et al, **ICCV 2021**.
  - Greedy Gradient Ensemble for Robust Visual Question Answering - Xinzhe Han et al, **ICCV 2021**.
  - Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos - Heeseung Yun et al, **ICCV 2021**.
  - Weakly Supervised Relative Spatial Reasoning for Visual Question Answering - Pratyay Banerjee et al, **ICCV 2021**.
  - Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering - Qingxing Cao et al, **ICCV 2021**.
  - Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering - Corentin Dancette et al, **ICCV 2021**.
  - Auto-Parsing Network for Image Captioning and Visual Question Answering - Xu Yang et al, **ICCV 2021**.
  - Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue - Shoya Matsumori et al, **ICCV 2021**.
  - Check It Again:Progressive Visual Question Answering via Visual Entailment - Qingyi Si et al, **ACL 2021**. [[code]](https://github.com/PhoebusSi/SAR)
  - Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering - Siddharth Karamcheti et al, **ACL 2021**. [[code]](https://github.com/siddk/vqa-outliers)
  - In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering - Peter Vickers et al, **ACL 2021**.
  - Towards Visual Question Answering on Pathology Images - Xuehai He et al, **ACL 2021**. [[code]](https://github.com/UCSD-AI4H/PathVQA)
  - Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions - Daniel Rosenberg et al, **ACL 2021**. [[code]](https://danrosenberg.github.io/rad-measure/)
  - LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering - Zujie Liang et al, **SIGIR 2021**. [[code]](https://github.com/jokieleung/LPF-VQA)
  - Passage Retrieval for Outside-Knowledge Visual Question Answering - Chen Qu et al, **SIGIR 2021**. [[code]](https://github.com/prdwb/okvqa-release)
  - Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering - Aman Jain et al, **SIGIR 2021**. [[code]](https://s3vqa.github.io/)
  - Visual Question Rewriting for Increasing Response Rate - Jiayi Wei et al, **SIGIR 2021**.
  - Separating Skills and Concepts for Novel Visual Question Answering - Spencer Whitehead et al, **CVPR 2021**.
  - Roses Are Red, Violets Are Blue... but Should VQA Expect Them To? - Corentin Kervadec et al, **CVPR 2021** [[code]](https://github.com/gqa-ood/GQA-OOD)
  - Predicting Human Scanpaths in Visual Question Answering - Xianyu Chen et al, **CVPR 2021**
  - Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules - Aisha Urooj et al, **CVPR 2021**
  - Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing - Yuanyuan Yuan et al, **CVPR 2021**
  - How Transferable Are Reasoning Patterns in VQA? - Corentin Kervadec et al, **CVPR 2021**
  - Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels - Mingda Zhang et al, **CVPR 2021**
  - Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation - Tao Tu et al, **CVPR 2021**
  - MultiModalQA: complex question answering over text, tables and images - Alon Talmor et al, **ICLR 2021**. [[page]](https://allenai.github.io/multimodalqa/)
  - CLEVR_HYP: A Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images - Shailaja Keyur Sampat et al, **NAACL-HLT 2021**. [[code]](https://github.com/shailaja183/clevr_hyp)
  - Video Question Answering with Phrases via Semantic Roles - Arka Sadhu et al, **NAACL-HLT 2021**.
  - SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency - Sameer Dharur et al, **NAACL-HLT 2021**.
  - Ensemble of MRR and NDCG models for Visual Dialog - Idan Schwartz, **NAACL-HLT 2021**. [[code]](https://github.com/idansc/mrr-ndcg)
  - Regularizing Attention Networks for Anomaly Detection in Visual Question Answering - Doyup Lee et al, **AAAI 2021**.
  - A Case Study of the Shortcut Effects in Visual Commonsense Reasoning - Keren Ye et al, **AAAI 2021**. [[code]](https://github.com/yekeren/VCR-shortcut-effects-study)
  - VisualMRC: Machine Reading Comprehension on Document Images - Ryota Tanaka et al, **AAAI 2021**. [[page]](https://github.com/nttmdlab-nlp/VisualMRC)
- 2020
  - MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering - Tejas Gokhale et al, **EMNLP 2020**. [[code]](https://github.com/tejas-gokhale/vqa_mutant)
  - Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering - Zujie Liang et al, **EMNLP 2020**. [[code]](https://github.com/jokieleung/CL-VQA)
  - VD-BERT: A Unified Vision and Dialog Transformer with BERT - Yue Wang et al, **EMNLP 2020**.
  - Multimodal Graph Networks for Compositional Generalization in Visual Question Answering - Raeid Saqur et al, **NeurIPS 2020**.
  - Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies - Itai Gat et al, **NeurIPS 2020**.
  - Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data - Michael Cogswell et al, **NeurIPS 2020**.
  - On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law - Damien Teney et al, **NeurIPS 2020**.
  - Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder - Gouthaman KV et al, **ECCV 2020**.
  - Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions - Noa Garcia et al, **ECCV 2020**.
  - Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering - Ruixue Tang et al, **ECCV 2020**.
  - Visual Question Answering on Image Sets - Ankan Bansal et al, **ECCV 2020**.
  - VQA-LOL: Visual Question Answering under the Lens of Logic - Tejas Gokhale et al, **ECCV 2020**.
  - TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering - Xiaofeng Yang et al, **ECCV 2020**.
  - Spatially Aware Multimodal Transformers for TextVQA - Yash Kant et al, **ECCV 2020**.
  - On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering - Xinyu Wang et al, **CVPR 2020**.
  - In Defense of Grid Features for Visual Question Answering - Huaizu Jiang et al, **CVPR 2020**.
  - Counterfactual Samples Synthesizing for Robust Visual Question Answering - Long Chen et al, **CVPR 2020**.
  - Counterfactual Vision and Language Learning - Ehsan Abbasnejad et al, **CVPR 2020**.
  - Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA - Ronghang Hu et al, **CVPR 2020**.
  - Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing - Vedika Agarwal et al, **CVPR 2020**.
  - SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions - Ramprasaath R. Selvaraju et al, **CVPR 2020**.
  - TA-Student VQA: Multi-Agents Training by Self-Questioning - Peixi Xiong et al, **CVPR 2020**.
  - VQA With No Questions-Answers Training - Ben-Zion Vatashsky et al, **CVPR 2020**.
  - Hierarchical Conditional Relation Networks for Video Question Answering - Thao Minh Le et al, **CVPR 2020**.
  - Modality Shifting Attention Network for Multi-Modal Video Question Answering - Junyeong Kim et al, **CVPR 2020**.
  - Webly Supervised Knowledge Embedding Model for Visual Reasoning - Wenbo Zheng et al, **CVPR 2020**.
  - Differentiable Adaptive Computation Time for Visual Reasoning - Cristobal Eyzaguirre et al, **CVPR 2020**.
  - A negative case analysis of visual grounding methods for VQA - Robik Shrestha et al, **ACL 2020**.
  - Cross-Modality Relevance for Reasoning on Language and Vision - Chen Zheng et al, **ACL 2020**.
  - Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA - Hyounghun Kim et al, **ACL 2020**.
  - TVQA+: Spatio-Temporal Grounding for Video Question Answering - Jie Lei et al, **ACL 2020**.
  - BERT representations for Video Question Answering - Zekun Yang et al, **WACV 2020**.
  - Deep Bayesian Network for Visual Question Generation - Badri Patro et al, **WACV 2020**.
  - Robust Explanations for Visual Question Answering - Badri Patro et al, **WACV 2020**.
  - Visual Question Answering on 360deg Images - Shih-Han Chou et al, **WACV 2020**.
  - LEAF-QA: Locate, Encode & Attend for Figure Question Answering - Ritwick Chaudhry et al, **WACV 2020**.
  - Answering Questions about Data Visualizations using Efficient Bimodal Fusion - Kushal Kafle et al, **WACV 2020**.
  - Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text - Difei Gao et al, **CVPR 2020**. [[code]]( https://github.com/ricolike/mmgnn_textvqa)
- 2019
  - Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering - Claudio Greco et al, **ACL 2019**. [[code]]( https://www.aclweb.org/anthology/P19-1350/ )
  - Multi-grained Attention with Object-level Grounding for Visual Question Answering - Pingping Huang et al, **ACL 2019**.
  - Compact Trilinear Interaction for Visual Question Answering - Tuong Do Kim et al, **ICCV 2019**.
  - Scene Text Visual Question Answering - Ali Furkan Biten et al, **ICCV 2019**.
  - Multi-Modality Latent Interaction Network for Visual Question Answering - Peng Gao et al, **ICCV 2019**.
  - Relation-Aware Graph Attention Network for Visual Question Answering - Linjie Li et al, **ICCV 2019**.
  - Why Does a Visual Question Have Different Answers? - Nilavra Bhattacharya et al, **ICCV 2019**.
  - RUBi: Reducing Unimodal Biases for Visual Question Answering - Remi Cadene et al, **NeurIPS 2019**.
  - Self-Critical Reasoning for Robust Visual Question Answering - Jialin Wu et al, **NeurIPS 2019**.
  - Deep Modular Co-Attention Networks for Visual Question Answering - Zhou Yu et al, **CVPR 2019**. [[code]](https://github.com/MILVLG/mcan-vqa)
  - Information Maximizing Visual Question Generation - Ranjay Krishna et al, **CVPR 2019**. [code]
  - Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence - Amir Zadeh et al, **CVPR 2019**. [code]
  - Learning to Compose Dynamic Tree Structures for Visual Contexts - Kaihua Tang et al, **CVPR 2019**. [code]
  - Transfer Learning via Unsupervised Task Discovery for Visual Question Answering - Hyeonwoo Noh et al, **CVPR 2019**. [code]
  - Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph - Yao-Hung Hubert Tsai et al, **CVPR 2019**. [[code]](https://github.com/yaohungt/Gated-Spatio-Temporal-Energy-Graph)
  - Explainable and Explicit Visual Reasoning over Scene Graphs - Jiaxin Shi et al, **CVPR 2019**. [[code]](https://github.com/shijx12/XNM-Net)
  - MUREL: Multimodal Relational Reasoning for Visual Question Answering - Remi Cadene et al, **CVPR 2019**. [[code]](https://github.com/Cadene/murel.bootstrap.pytorch)
  - Image-Question-Answer Synergistic Network for Visual Dialog - Dalu Guo et al, **CVPR 2019**. [code]
  - RAVEN: A Dataset for Relational and Analogical Visual rEasoNing - Chi Zhang et al, **CVPR 2019**. [[project page]](http://wellyzhang.github.io/project/raven.html)
  - Cycle-Consistency for Robust Visual Question Answering - Meet Shah et al, **CVPR 2019**.
  - It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning - Monica Haurilet et al, **CVPR 2019**.
  - OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge - Kenneth Marino et al, **CVPR 2019**.
  - Visual Question Answering as Reading Comprehension - Hui Li et al, **CVPR 2019**.
  - Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering - Peng Gao et al, **CVPR 2019**.
  - Explicit Bias Discovery in Visual Question Answering Models - Varun Manjunatha et al, **CVPR 2019**.
  - Answer Them All! Toward Universal Visual Question Answering Models - Robik Shrestha et al, **CVPR 2019**.
  - Visual Query Answering by Entity-Attribute Graph Matching and Reasoning - Peixi Xiong et al, **CVPR 2019**.
  - Differential Networks for Visual Question Answering - Chenfei Wu et al, **AAAI 2019**. [code]
  - BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection - Hedi Ben-younes et al, **AAAI 2019**. [[code]](https://github.com/Cadene/block.bootstrap.pytorch)
  - Dynamic Capsule Attention for Visual Question Answering - Yiyi Zhou et al, **AAAI 2019**. [[code]](https://github.com/XMUVQA/CapsAtt)
  - Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering - Xiangpeng Li et al, **AAAI 2019**. [[code]](https://github.com/lixiangpengcs/PSAC)
  - Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning - Yiyi Zhou et al, **AAAI 2019**. [[code]](https://github.com/xiangmingLi/PIL)
  - Focal Visual-Text Attention for Memex Question Answering - Junwei Liang et al, **TPAMI 2019**. [[code]](https://memexqa.cs.cmu.edu/)
  - Combining Multiple Cues for Visual Madlibs Question Answering - Tatiana Tommasi et al, **IJCV 2019**. [code]
  - Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation - Sang-Woo Lee et al, **ICLR 2019**. [[code]](https://github.com/naver/aqm-plus)
  - Improving Visual Question Answering by Referring to Generated Paragraph Captions - Hyounghun Kim et al, **ACL 2019**.
- 2018
  - Bilinear Attention Networks - Jin-Hwa Kim et al, **NIPS 2018**. [code]
  - Chain of Reasoning for Visual Question Answering - Chenfei Wu et al, **NIPS 2018**. [code]
  - Learning Conditioned Graph Structures for Interpretable Visual Question Answering - Will Norcliffe-Brown et al, **NIPS 2018**. [[code]](https://github.com/aimbrain/vqa-project)
  - Learning to Specialize with Knowledge Distillation for Visual Question Answering - Jonghwan Mun et al, **NIPS 2018**. [code]
  - Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering - Medhini Narasimhan et al, **NIPS 2018**. [code]
  - Overcoming Language Priors in Visual Question Answering with Adversarial Regularization - Sainandan Ramakrishnan et al, **NIPS 2018**. [code]
  - Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering - Somak Aditya et al, **AAAI 2018**. [code]
  - Co-Attending Free-Form Regions and Detections with Multi-Modal Multiplicative Feature Embedding for Visual Question Answering - Pan Lu et al, **AAAI 2018**. [[code]](https://github.com/lupantech/dual-mfa-vqa)
  - Exploring Human-Like Attention Supervision in Visual Question Answering - Somak Aditya et al, **AAAI 2018**. [code]
  - Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents - Bo Wang et al, **AAAI 2018**. [code]
  - Feature Enhancement in Attention for Visual Question Answering - Yuetan Lin et al, **IJCAI 2018**. [code]
  - A Question Type Driven Framework to Diversify Visual Question Generation - Zhihao Fan et al, **IJCAI 2018**. [code]
  - Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network - Zhou Zhao et al, **IJCAI 2018**. [code]
  - Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks - Zhou Zhao et al, **IJCAI 2018**. [code]
  - Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - Peter Anderson et al, **CVPR 2018**. [[code(author)]](https://github.com/peteanderson80/bottom-up-attention) [[code(pythiaV0.1)]](https://github.com/facebookresearch/pythia) [[code(Pytorch Reimplementation)]](https://github.com/hengyuan-hu/bottom-up-attention-vqa)
  - Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge - Damien Teney et al, **CVPR 2018**. [code]
  - Learning by Asking Questions - Ishan Misra et al, **CVPR 2018**. [code]
  - Embodied Question Answering - Abhishek Das et al, **CVPR 2018**. [code]
  - VizWiz Grand Challenge: Answering Visual Questions From Blind People - Danna Gurari et al, **CVPR 2018**. [code]
  - Textbook Question Answering Under Instructor Guidance With Memory Networks - Juzheng Li et al, **CVPR 2018**. [[code]](https://github.com/freerailway/igmn)
  - IQA: Visual Question Answering in Interactive Environments - Daniel Gordon et al, **CVPR 2018**. [[sample video]](https://youtu.be/pXd3C-1jr98)
  - Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering - Aishwarya Agrawal et al, **CVPR 2018**. [code]
  - Learning Answer Embeddings for Visual Question Answering - Hexiang Hu et al, **CVPR 2018**. [code]
  - DVQA: Understanding Data Visualizations via Question Answering - Kushal Kafle et al, **CVPR 2018**. [code]
  - Cross-Dataset Adaptation for Visual Question Answering - Wei-Lun Chao et al, **CVPR 2018**. [code]
  - Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering - Unnat Jain et al, **CVPR 2018**. [code]
  - Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering - Duy-Kien Nguyen et al, **CVPR 2018**. [code]
  - Visual Question Generation as Dual Task of Visual Question Answering - Yikang Li et al, **CVPR 2018**. [code]
  - Focal Visual-Text Attention for Visual Question Answering - Junwei Liang et al, **CVPR 2018**. [code]
  - Motion-Appearance Co-Memory Networks for Video Question Answering - Jiyang Gao et al, **CVPR 2018**. [code]
  - Visual Question Answering With Memory-Augmented Networks - Chao Ma et al, **CVPR 2018**. [code]
  - Visual Question Reasoning on General Dependency Tree - Qingxing Cao et al, **CVPR 2018**. [code]
  - Differential Attention for Visual Question Answering - Badri Patro et al, **CVPR 2018**. [code]
  - Learning Visual Knowledge Memory Networks for Visual Question Answering - Zhou Su et al, **CVPR 2018**. [code]
  - IVQA: Inverse Visual Question Answering - Feng Liu et al, **CVPR 2018**. [code]
  - Customized Image Narrative Generation via Interactive Visual Question Generation and Answering - Andrew Shin et al, **CVPR 2018**. [code]
  - Visual Question Answering as a Meta Learning Task - Damien Teney et al, **ECCV 2018**. [code]
  - Question-Guided Hybrid Convolution for Visual Question Answering - Peng Gao et al, **ECCV 2018**. [code]
  - Goal-Oriented Visual Question Generation via Intermediate Rewards - Junjie Zhang et al, **ECCV 2018**. [code]
  - Multimodal Dual Attention Memory for Video Story Question Answering - Kyung-Min Kim et al, **ECCV 2018**. [code]
  - A Joint Sequence Fusion Model for Video Question Answering and Retrieval - Youngjae Yu et al, **ECCV 2018**. [code]
  - Deep Attention Neural Tensor Network for Visual Question Answering - Yalong Bai et al, **ECCV 2018**. [code]
  - Question Type Guided Attention in Visual Question Answering - Yang Shi et al, **ECCV 2018**. [code]
  - Learning Visual Question Answering by Bootstrapping Hard Attention - Mateusz Malinowski et al, **ECCV 2018**. [code]
  - Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering - Medhini Narasimhan et al, **ECCV 2018**. [code]
  - Visual Question Generation for Class Acquisition of Unknown Objects - Kohei Uehara et al, **ECCV 2018**. [[code]](https://github.com/mil-tokyo/vqg-unknown)
  - Image Captioning and Visual Question Answering Based on Attributes and External Knowledge - Qi Wu et al, **TPAMI 2018**. [code]
  - FVQA: Fact-Based Visual Question Answering - Peng Wang et al, **TPAMI 2018**. [code]
  - Interpretable Counting for Visual Question Answering - Alexander Trott et al, **ICLR 2018**. [code]
  - Learning to Count Objects in Natural Images for Visual Question Answering - Yan Zhang et al, **ICLR 2018**. [code]
  - A Better Way to Attend: Attention With Trees for Video Question Answering - Hongyang Xue et al, **TIP 2018**. [[code]](https://github.com/xuehy/TreeAttention)
  - Zero-Shot Transfer VQA Dataset - Pan Lu et al, **arxiv preprint**. [code]
  - Visual Question Answering using Explicit Visual Attention - Vasileios Lioutas et al, ***\*ISCAS 2018\****. [code]
  - Explicit ensemble attention learning for improving visual question answering - Vasileios Lioutas et al, ***\*Pattern Recognition Letters 2018\****. [code]
- 2017-2015
  - Learning to Reason: End-to-End Module Networks for Visual Question Answering - Ronghang Hu et al, **ICCV 2017**. [code]
  - Structured Attentions for Visual Question Answering - Chen Zhu et al, **ICCV 2017**. [[code]](https://github.com/shtechair/vqa-sva)
  - VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation - Chuang Gan et al, **ICCV 2017**. [[code]](https://github.com/Cold-Winter/vqs)
  - **Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering** - Zhou Yu et al, **ICCV 2017**. [[code]](https://github.com/yuzcccc/vqa-mfb)
  - An Analysis of Visual Question Answering Algorithms - Kushal Kafle et al, **ICCV 2017**. [code]
  - MUTAN: Multimodal Tucker Fusion for Visual Question Answering - Hedi Ben-younes et al, **ICCV 2017**. [[code]](https://github.com/cadene/vqa.pytorch)
  - MarioQA: Answering Questions by Watching Gameplay Videos - Jonghwan Mun et al, **ICCV 2017**. [code]
  - Learning to Disambiguate by Asking Discriminative Questions - Yining Li et al, **ICCV 2017**. [code]
VQA Challenge Leaderboard
- test-std 2018
  - VQA Challenge 2018 Leaderboard in EvalAI
- test-std 2017
  - VQA Challenge 2017(Open-Ended) Leaderboard in EvalAI
- [TextVQA]( https://textvqa.org/ )
  - TextVQA Challenge 2019 Leaderboard in EvalAI
Licenses
- VQA-CP
  - ![CC0
  - Jokie Leung
  - ![CC0

Categories

Papers 213 Licenses 3 VQA Challenge Leaderboard 3

Sub Categories

2018 54 2021 49 2020 38 2019 36 2022 26 2017-2015 8 VQA-CP 3 Survey 2 [TextVQA]( https://textvqa.org/ ) 1 test-std 2018 1 test-std 2017 1