Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/cdancette/vqa-cp-leaderboard

A collections of papers about VQA-CP datasets and their results
https://github.com/cdancette/vqa-cp-leaderboard
Last synced: 10 days ago
JSON representation
A collections of papers about VQA-CP datasets and their results
Host: GitHub
URL: https://github.com/cdancette/vqa-cp-leaderboard
Owner: cdancette
Created: 2020-09-25T09:12:20.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2022-03-18T10:02:49.000Z (over 2 years ago)
Last Synced: 2024-04-05T14:31:02.787Z (8 months ago)
Language: Shell
Size: 86.9 KB
Stars: 36
Watchers: 3
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project

README

        
VQA-CP  Leaderboard

===================

A collections of papers about the VQA-CP dataset and a benchmark / leaderboard of their results.

VQA-CP_ is an out-of-distribution dataset for Visual Question Answering,

which is designed to penalize models that rely on question biases to give an answer.

You can download VQA-CP annotations here : https://computing.ece.vt.edu/~aish/vqacp/

Notes:

- All reported papers do not use the same baseline architectures, 

  so the scores might not be directly comparable. This leaderboard 

  is only made as a reference of all bias-reduction methods that 

  were tested on VQA-CP.

- We mention the presence or absence of a validation set, because 

  for out-of-distribution datasets, it is very important to find hyperparameters 

  and do early-stopping on a validation set that has the same distribution as 

  the training set. Otherwise, there is a risk of overfitting the testing set 

  and its biases, which defeats the point of the VQA-CP dataset. This is why we 

  **highly recommand**  for future work that they build a  **validation set**  

  from a part of training set.

You can read an overview of some of those bias-reduction methods here: https://cdancette.fr/2020/11/21/overview-bias-reductions-vqa/

VQA-CP v2

***********

In bold are highlighted best results on architectures without pre-training.

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| Name            | Base Arch.           | Conference              | All       | Yes/No     | Numbers    | Other      | Validation |

+=================+======================+=========================+===========+============+============+============+============+

| AttReg_ [2]_    | LMH_                 | Preprint                | 59.92     | 87.28      | 52.39      | 47.65      |            |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| GGE-DQ_         | UpDown               | ICCV 2021               | 57.32     | 87.04      | 27.75      | 49.59      |            |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| AdaVQA_         | UpDown               | IJCAI 2021              | 54.67     | 72.47      | 53.81      | 45.58      | No Valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| DecompLR_       | UpDown               | AAAI 2020               | 48.87     | 70.99      | 18.72      | 45.57      | No Valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| MUTANT_         | LXMERT               | EMNLP 2020              | 69.52     | 93.15      | 67.17      | 57.78      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| MUTANT_         | UpDown               | EMNLP 2020              | **61.72** | **88.90**  | **49.68**  | **50.78**  | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| CL_             | UpDown + LMH_ + CSS_ | EMNLP 2020              | 59.18     | 86.99      | 49.89      | 47.16      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| RMFE_           | UpDown + LMH_        | NeurIPS 2020            | 54.55     | 74.03      | 49.16      | 45.82      | No Valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| RandImg_        | UpDown               | NeurIPS 2020            | 55.37     | 83.89      | 41.60      | 44.20      | Valset     |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| Loss-Rescaling_ | UpDown + LMH_        | Preprint 2020           | 53.26     | 72.82      | 48.00      | 44.46      |            |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| ESR_            | UpDown               | ACL 2020                | 48.9      | 69.8       | 11.3       | 47.8       |            |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| GradSup_        | Unshuffling_         | ECCV 2020               | 46.8      | 64.5       | 15.3       | 45.9       | **Valset** |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| VGQE_           | S-MRL                | ECCV 2020               | 50.11     | 66.35      | 27.08      | 46.77      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| CSS_            | UpDown + LMH_        | CVPR 2020               | 58.95     | 84.37      | 49.42      | 48.21      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| Semantic_       | UpDn + RUBi_         | Preprint 2020           | 47.5      |            |            |            |            |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| Unshuffling_    | UpDown               | Preprint 2020           | 42.39     | 47.72      | 14.43      | 47.24      | **Valset** |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| CF-VQA_         | UpDown + LMH_        | Preprint 2020           | 57.18     | 80.18      | 45.62      | 48.31      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| LMH_            | UpDown               | EMNLP 2019              | 52.05     | 69.81 [1]_ | 44.46 [1]_ | 45.54 [1]_ | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| RUBi_           | S-MRL [3]_           | NeurIPS 2019            | 47.11     | 68.65      | 20.28      | 43.18      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| SCR_ [2]_       | UpDown               | NeurIPS 2019            | 49.45     | 72.36      | 10.93      | 48.02      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| NSM_            |                      | NeurIPS 2019            | 45.80     |            |            |            |            |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| HINT_ [2]_      | UpDown               | ICCV 2019               | 46.73     | 67.27      | 10.61      | 45.88      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| ActSeek_        | UpDown               | CVPR 2019               | 46.00     | 58.24      | 29.49      | 44.33      | **ValSet** |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| GRL_            | UpDown               | NAACL-HLT 2019 Workshop | 42.33     | 59.74      | 14.78      | 40.76      | **Valset** |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| AdvReg_         | UpDown               | NeurIPS 2018            | 41.17     | 65.49      | 15.48      | 35.48      | No Valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

| GVQA_           |                      | CVPR 2018               | 31.30     | 57.99      | 13.68      | 22.14      | No valset  |

+-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+

.. [1] Retrained by CSS_

.. [2] Using additional information

.. [3] S-MRL stands for Simplified-MUREL. The architecture was proposed in RUBi_.

.. VQA-CP v1

.. *********

Papers

******

.. .. |br| raw:: html

..    


_`GGE-DQ`

    | Greedy Gradient Ensemble for Robust Visual Question Answering

    | Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

    | https://arxiv.org/pdf/2107.12651.pdf

_`DecompLR`

    | Overcoming language priors in vqa via decomposed linguistic representations

    | Chenchen Jing, Yuwei Wu, Xiaoxun Zhang, Yunde Jia, Qi Wu

    | https://ojs.aaai.org/index.php/AAAI/article/view/6776

_`AdaVQA`

    | AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

    | Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto Del Bimbo

    | https://arxiv.org/pdf/2105.01993.pdf

_`MUTANT`

    | MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering -  **EMNLP 2020** 

    | Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

    | https://www.aclweb.org/anthology/2020.emnlp-main.63/

    | code: https://github.com/tejas-gokhale/vqa_mutant

_`CL`

    | Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering   -  **EMNLP 2020** 

    | Zujie Liang, Weitao Jiang, Haifeng Hu, Jiaying Zhu                                                       

    | https://www.aclweb.org/anthology/2020.emnlp-main.265.pdf                                                 

_`RMFE`

    | Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies -  **NeurIPS 2020** 

    | Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan                                                         

    | https://proceedings.neurips.cc/paper/2020/hash/20d749bc05f47d2bd3026ce457dcfd8e-Abstract.html   

    | code: https://github.com/itaigat/removing-bias-in-multi-modal-classifiers             

_`RandImg`

    | On the Value of Out-of-Distribution Testing:An Example of Goodhart’s Law

    | Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

    | https://arxiv.org/abs/2005.09241

_`Loss-Rescaling`

    | Loss-rescaling VQA: Revisiting Language Prior Problem from a Class-imbalance View - **Preprint 2020** 

    | Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian                                                     

    | https://arxiv.org/abs/2010.16010                                                                      

_`ESR` (Embarrassingly Simple Regularizer)

    | A Negative Case Analysis of Visual Grounding Methods for VQA - **ACL 2020**

    | Robik Shrestha, Kushal Kafle, Christopher Kanan

    | https://www.aclweb.org/anthology/2020.acl-main.727.pdf

_`GradSup`

    | Learning what makes a difference from counterfactual examples and gradient supervision -  **ECCV 2020** 

    | Damien Teney, Ehsan Abbasnedjad, Anton van den Hengel                                                   

    | https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123550579.pdf                                  

_`VGQE`

    | Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder  -  **ECCV 2020** 

    | Gouthaman KV, Anurag Mittal                                                                                     

    | https://arxiv.org/abs/2007.06198                                                                                

_`CSS`

    | Counterfactual Samples Synthesizing for Robust Visual Question Answering -  **CVPR 2020** 

    | Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, Yueting Zhuang                  

    | https://arxiv.org/abs/2003.06576    

    | code: https://github.com/yanxinzju/CSS-VQA                                                      

_`Semantic`

    | Estimating semantic structure for the VQA answer space  -  **Preprint 2020** 

    | Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf           

    | https://arxiv.org/abs/2006.05726                                             

_`Unshuffling`

    | Unshuffling Data for Improved Generalization -  **Preprint 2020** 

    | Damien Teney, Ehsan Abbasnejad, Anton van den Hengel              

    | https://arxiv.org/abs/2002.11894                         

        .. raw:: html

            Summary

            Inspired by Invariant Risk Minimization (Arjovskyet al.).

            They make use of two training sets with different

            biases to learn a more robust classifier (that will perform

            better on OOD data). 

            

_`CF-VQA`

    | Counterfactual VQA: A Cause-Effect Look at Language Bias  -  **Preprint 2020** 

    | Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen   

    | https://arxiv.org/abs/2006.04315v2                                             

        .. raw:: html

            Summary

        They formalize the ensembling framwork from RUBi_ and LMH_ using

        the causality framework.

        .. raw:: html

            

_`LMH`

    | Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases -  **EMNLP 2019** 

    | Christopher Clark, Mark Yatskar, Luke Zettlemoyer                                                       

    | https://arxiv.org/abs/1909.03683     

    | code: https://github.com/chrisc36/bottom-up-attention-vqa                                                                   

_`RUBi`

    | RUBi: Reducing Unimodal Biases in Visual Question Answering  -  **NeurIPS 2019** 

    | Remi Cadene, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, Devi Parikh      

    | https://arxiv.org/abs/1906.10169                                                 

        .. raw:: html

            

            Summary        

                
During training : Ensembling with a question-only model that will learn the biases, and let the main VQA model learn

                useful behaviours.


                
During testing: We remove the question-only model, and keep only the VQA model.

            

            

    | code: https://github.com/cdancette/rubi.bootstrap.pytorch

_`NSM`

    | Learning by Abstraction: The Neural State Machine

    | Drew A. Hudson, Christopher D. Manning

    | https://arxiv.org/abs/1907.03950

_`SCR` 

    | Self-Critical Reasoning for Robust Visual Question Answering -  **NeurIPS 2019** 

    | Jialin Wu, Raymond J. Mooney                                                     

    | https://arxiv.org/abs/1905.09998    

    | code: https://github.com/jialinwu17/self_critical_vqa

_`HINT`

    | Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded -  **ICCV 2019**           

    | Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh 

    | https://arxiv.org/abs/1902.03751                                                                                   

_`ActSeek`

    | Actively Seeking and Learning from Live Data -  **CVPR 2019** 

    | Damien Teney, Anton van den Hengel                            

    | https://arxiv.org/abs/1904.02865                              

_`GRL`

    | Adversarial Regularization for Visual Question Answering:Strengths, Shortcomings, and Side Effects -  **NAACL HLT - Workshop on Shortcomings in Vision and Language (SiVL) **

    | Gabriel Grand, Yonatan Belinkov

    | https://arxiv.org/pdf/1906.08430.pdf

    | code: https://github.com/gabegrand/adversarial-vqa

_`AdvReg`

    | Overcoming Language Priors in Visual Question Answering with Adversarial Regularization -  **NeurIPS 2018**                   

    | Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee                                                                         

    | https://papers.nips.cc/paper/7427-overcoming-language-priors-in-visual-question-answering-with-adversarial-regularization.pdf 

    | code: 

_`GVQA`

    | Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering -  **CVPR 2018** 

    | Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi                                      

    | https://arxiv.org/abs/1712.00377

    | code: https://github.com/AishwaryaAgrawal/GVQA                                                              

.. _VQA-CP: https://arxiv.org/abs/1712.00377