Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-vision-and-language

A curated list of awesome vision and language resources (still under construction... stay tuned!)
https://github.com/sangminwoo/awesome-vision-and-language

Last synced: about 2 hours ago
JSON representation

Image Captioning
- 1901.02527
- 1908.06954
- 2004.03708
- 2003.00387
- 2007.11731 - GC](https://github.com/YiwuZhong/Sub-GC) | []() |
- 2009.12313
- 2102.04990
- 1412.2306
- 1411.4555
- 1502.03044 - attend-and-tell](https://github.com/yunjey/show-attend-and-tell) | |
- 1411.4952 - concepts](https://github.com/s-gupta/visual-concepts) | []() |
- 1603.03925 - attention](https://github.com/chapternewscu/image-captioning-with-semantic-attention) | []() |
- 1612.01887
- 1612.00563
- 1611.06607
- 1704.03899
- CVPR 2017
- ENNLP 2018 - paragraph-captioning](https://github.com/lukemelas/image-paragraph-captioning) | []() |
- 1803.09845
- 1807.03871
- 1805.08191
- 1811.10787
- 1906.02365
- 1903.05942
- 1903.12020
- 1904.01475
- 1812.02378
- CVPR 2019
Image Retrieval
- 1511.07067
- 1812.07119
- 2105.13868
- 2203.15867 - NLP/imagecode) | []() |
- 2311.17136 - AI-Lab/UniIR) | []() |
- 2407.12346 - Pert](https://github.com/NEC-N-SOGI/query-perturbation) | []() |
- 2407.15239
Scene Text Recognition
- 1908.09231
- 1904.01906 - text-recognition-benchmark) | []() |
Visual Dialog
- 1611.08669 - mlp-lab/visdial) | [visualdialog](https://visualdialog.org/) |
- 2303.05983 - alpha/Accountable-Textual-Visual-Chat) | []() |
- 1803.11186
Visual Grounding
- 1611.09978
- 1908.07553
- 1812.03299
- 1908.06354
- 1908.07129 - pytorch) | []() |
- 2203.16518
Visual Question Answering
- 1606.00061
- 1606.01847 - mcb](https://github.com/akirafukui/vqa-mcb) | []() |
- 1511.02274 - san](https://github.com/zcyang/imageqa-san) | []() |
- 1511.05234
- 1603.01417 - plus](https://github.com/vlgiitr/dmn-plus) | []() |
- 1606.01455 - mrn-vqa](https://github.com/jnhwkim/nips-mrn-vqa) | []() |
- 1609.05600
- 1612.00837
- 1704.05526
- 1803.08896
- 1708.02711 - winner](https://github.com/markdtw/vqa-winner-cvprw-2017) | []() |
- 1810.02358 - Transfer-ExternalData](https://github.com/HyeonwooNoh/VQA-Transfer-ExternalData) | |
- 1904.08920
- ICCV2019
- 1907.12133 - graphs-vqa](https://github.com/czhang0528/scene-graphs-vqa) | []() |
- 2208.01813
- 1505.00468
- 1707.07998
- 1902.09506
- 1707.07998
Survey
Dataset
- 1705.08421
- 1904.03493
- 1612.06890
- 1811.10830
- 1604.03968 - visual-storytelling-seq2seq](https://github.com/ai-visual-storytelling-seq2seq) | [VIST](http://visionandlanguage.net/VIST/) |
- 2010.00763 - LOGO](https://github.com/NVlabs/Bongard-LOGO) | []() |
- 2205.13803 - HOI](https://github.com/NVlabs/Bongard-HOI)| []() |
Scene Graph
- 7298990
- 1701.02426 - graph-TF-release](https://github.com/danfeiX/scene-graph-TF-release) | |
- 1707.09700 - li/MSDN) | |
- 1711.06640 - motifs](https://github.com/rowanz/neural-motifs) | |
- 1802.02598
- 1811.06410
- 1804.01622
- 1808.00191 - rcnn.pytorch](https://github.com/jwyang/graph-rcnn.pytorch) | |
- 1904.00560
- 1811.10696
- 1903.02728
- 1903.03326
- 1812.01880 - Scene-Graph-Generation) | []() |
- 1812.02347
- 1904.11622 - label](https://github.com/vincentschen/limited-label-scene-graphs) | []() |
- 2002.11949 - Graph-Benchmark](https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch) | []() |
- 2003.12962 - Net](https://github.com/taksau/GPS-Net) | []() |
- 2006.09623
- 2007.08760 - eccv20](https://github.com/Kenneth-Wong/het-eccv20) | []() |
- sceneGraph_Mem
- 1602.07332
text2image
- 1605.05396
- 1612.03242
- 1711.10485
- 1802.09178
- 1812.02784
- 1903.05854
- 1904.01310
- 1904.01480
- 1811.09845
- 1802.09178
- 1909.05379
Video Captioning
- 1510.07712
- 1701.03126
- CVPR_2017
- 1804.00100
- 1812.05634 - inf](https://github.com/jamespark3922/adv-inf) | |
- 1904.03870
- 1906.04375
- 1411.4389
- 1611.08002
Video Question Answering
- 1512.02902
- 1809.01696
- 2007.08751 - VideoQA](https://github.com/noagarcia/ROLL-VideoQA) | []() |
- 2011.07735
Video Understanding
- 1811.08383 - shift-module](https://github.com/mit-han-lab/temporal-shift-module) | |
- 1910.11009
Vision-and-Language Pretraining
- 1908.07490
- 1904.01766
- 1907.07804
- 1908.06066
- 1909.11059
- 1911.11237
- 2006.09882
- 2004.06165
- 2006.16934
- 2101.00529
- 2006.06666
- 2103.00020
- 2103.05247 - computation](https://github.com/kzl/universal-computation) | []() |
- 2102.05918
- 2103.01988
- 2102.10772
- 2102.12092
- 2103.06561
- 2305.08675
- vilbert
Visual Reasoning
Visual Relationship Detection
- 1608.00187 - Relationship-Detection](https://github.com/Prof-Lu-Cewu/Visual-Relationship-Detection) | |
- 1702.07191
- 1702.08319
- 1703.03054
- 1704.03114
- 1611.06641 - clc](https://github.com/BryanPlummer/pl-clc) | []() |
- 1707.09423
- 1803.10362
- 1807.04979
- 1808.00171
- 1910.12324
Visual Storytelling
- 1804.09160 - xw/AREL) | |
- 2002.00774
- AAAI 2020
Licenses
- Sangmin Woo
- ![CC0
Vision and Language Navigation
- fda_pdf
- mam_paper
- 1711.11543
- 1711.07280
- mam_paper

Programming Languages

Jupyter Notebook 1 Python 1

Ecosyste.ms: Awesome

awesome-vision-and-language

Image Captioning

Image Retrieval

Scene Text Recognition

Visual Dialog

Visual Grounding

Visual Question Answering

Survey

Dataset

Scene Graph

text2image

Video Captioning

Video Question Answering

Video Understanding

Vision-and-Language Pretraining

Visual Reasoning

Visual Relationship Detection

Visual Storytelling

Licenses

Vision and Language Navigation