https://github.com/ailln/vqa-roadmap
🍌Visual Question Answering Roadmap.
https://github.com/ailln/vqa-roadmap
roadmap visual-question-answering vqa
Last synced: 3 months ago
JSON representation
🍌Visual Question Answering Roadmap.
- Host: GitHub
- URL: https://github.com/ailln/vqa-roadmap
- Owner: Ailln
- License: mit
- Created: 2020-01-05T15:14:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-01-06T17:51:31.000Z (over 6 years ago)
- Last Synced: 2025-06-14T09:06:44.759Z (12 months ago)
- Topics: roadmap, visual-question-answering, vqa
- Size: 3.91 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# vqa-roadmap
🍌Visual Question Answering Roadmap.
## ORIGIN
- Official website: [visualqa.org](https://visualqa.org/index.html)
- VQA dataset: [Full release (v2.0)](https://visualqa.org/download.html)
- paper: [VQA: Visual Question Answering (ICCV 2015)](https://arxiv.org/pdf/1505.00468.pdf)
## Read List
### Survey
- paper:
- [Visual Question Answering: Datasets, Algorithms, and Future Challenges](https://arxiv.org/pdf/1610.01465.pdf)
- [Visual question answering: A survey of methods and datasets](https://arxiv.org/pdf/1607.05910.pdf)
- [Survey of Visual Question Answering: Datasets and Techniques](https://arxiv.org/pdf/1705.03865.pdf)
- code:
- [pythia](https://github.com/facebookresearch/pythia)
- blog:
- [awesome-vqa](https://github.com/chingyaoc/awesome-vqa)
- [awesome-visual-question-answering](https://github.com/jokieleung/awesome-visual-question-answering)
- [Tutorial on Answering Questions about Images with Deep Learning](https://arxiv.org/pdf/1610.01076.pdf)
### Origin VQA
- paper: [VQA: Visual Question Answering](https://arxiv.org/pdf/1505.00468.pdf)
- code: [Visual-Question-Answering](https://github.com/Axe--/Visual-Question-Answering)
- blog: None
### DFAF
- paper: [Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering](https://arxiv.org/pdf/1812.05252.pdf)
- code: None
- blog: [利用intra-和inter-modality关系的VQA模型](https://zhuanlan.zhihu.com/p/65497795)
### MUREL
- paper: [MUREL: Multimodal Relational Reasoning for Visual Question Answering](https://arxiv.org/pdf/1902.09487.pdf)
- code: None
- blog: [基于多模态关系推理的VQA模型](https://zhuanlan.zhihu.com/p/60972299)
## Expert
- [周博磊](http://people.csail.mit.edu/bzhou/)
- [吴琦](http://www.qi-wu.me/)
## Reference
- [2017 VQA Challenge 第一名技术报告](https://zhuanlan.zhihu.com/p/29688475)
- [由浅及深,细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia](https://zhuanlan.zhihu.com/p/56505674)
- [一文带你了解视觉问答VQA](https://www.jianshu.com/p/76d2e081e303)
- [【CV+NLP】更有智慧的眼睛:图像描述(Image Caption)&视觉问答(VQA)综述(上)](https://zhuanlan.zhihu.com/p/52499758)
- [DeepMind提出视觉问答新模型,CLEVR精度98.8%](https://zhuanlan.zhihu.com/p/41546921)
- [学界 | 视觉问答全景概述:从数据集到技术方法](https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650727250&idx=4&sn=161751cc5999099b1b217ed659a17aa2)
- [梅涛:深度学习为视觉和语言之间搭建了一座桥梁](https://www.msra.cn/zh-cn/news/features/vision-and-language-20170713)
- [Visual Question Answering 简介 + 近年文章](https://zhuanlan.zhihu.com/p/57207832)