Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Multi-Modal-Dialog
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
https://github.com/ImKeTT/Awesome-Multi-Modal-Dialog
Last synced: about 16 hours ago
JSON representation
-
Datasets
- VQA: Visual Question Answering
- Visual Dialog
- GuessWhat?! Visual object discovery through multi-modal dialogue
- Visual Reference Resolution using Attention Memory for Visual Dialog
- CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning
- Image-grounded conversations: Multimodal context for natural question and response generation (IGC)
- Embodied Question Answering (EQA)
- Vision-and-Dialog Navigation
- Image-Chat: Engaging Grounded Conversations - grounded dialog (VGD) | ACL2020 | Facebook |
- PhotoChat: A human-human dialogue dataset with photo sharing behavior for joint image-text modeling
-
Methods
-
Visual Grounded Dialogue
- Visual Dialog
- Open Domain Dialogue Generation with Latent Images - Chat; Reddit | AAAI 2021 | CODE | FB |
- Maria: A Visual Experience Powered Conversational Agent
- Learning to Ground Visual Objects for Visual Dialog - COCO | ACL 2021 | CODE | FB |
- Multi-Modal Open-Domain Dialogue - Chat; ConvAI2; EmpatheticDialogues; Wizard of WikiPedia; BlendedSkillTalk | EMNLP 2021 | CODE | FB |
- Iterative Context-Aware Graph Inference for Visual Dialog
- Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer - VisDial) | FB |
- VD-BERT: A Unified Vision and Dialog Transformer with BERT - BERT) | FB |
- GuessWhat?! Visual object discovery through multi-modal dialogue
- Ask No More: Deciding when to guess in referential visual dialogue - unitn-uva.github.io) | FB |
- Visual Reference Resolution using Attention Memory for Visual Dialog
- Visual Coreference Resolution in Visual Dialog using Neural Module Networks
- Dual Attention Networks for Visual Reference Resolution in Visual Dialog - VisDial) | AB |
- Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs (LTMI)
- Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline - bert) | AB |
- Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog - COCO | ACL 2019 | CODE | AB |
- Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
- Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
- Multi-View Attention Network for Visual Dialog - mlp-lab/visdial-challenge-starter-pytorch) | AB |
- Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning - mlp-lab/visdial-rl) | FB |
- Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
- Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat - unitn-uva.github.io) | FB |
-
Multi-modal Conversation
- Multimodal Dialogue Response Generation
- Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images - COCO; Flicker30K | ACL 2021 | [CODE](https://github.com/shh1574/multi-modal-dialogue-dataset) | FB |
- Towards Enriching Responses with Crowd-sourced Knowledge for Task-oriented Dialogue
- Multimodal Dialog System: Generating Responses via Adaptive Decoders
- Multimodal Dialog Systems via Capturing Context-aware Dependencies of Semantic Elements
- Text is NOT Enough: Integrating Visual Impressions into Open-domain Dialogue Generation
-
Question Generation
- Category-Based Strategy-Driven Question Generator for Visual Dialogue
- Visual Dialogue State Tracking for Question Generation
- Goal-Oriented Visual Question Generation via Intermediate Re- wards
- Learning goal-oriented visual dialog via tempered policy gradient
- Information maximizing visual question generation
-
Visual Navigation
- Learning to interpret natural language navigation instructions from observations - 232.pdf) | AAAI 2011 | CODE |
- Talk the walk: Navigating new york city through grounded dialogue
- Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction - lab/drif) |
- Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments - research/google-research/tree/master/r4r); VQA; [Matterport3D](https://arxiv.org/abs/1709.06158) | CVPR 2018 | [CODE](https://github.com/peteanderson80/Matterport3DSimulator) |
- Embodied Question Answering
- IQA: Visual Question Answering in Interactive Environments - THOR | CVPR 2018 | [CODE](https://github.com/danielgordon10/thor-iqa-cvpr-2018) |
- Touchdown: Natural language navigation and spatial reasoning in visual street environments - lab/touchdown) |
- Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation - research/google-research/tree/master/r4r) | ACL 2019 | CODE |
- Learning to navigate unseen environments: Back translation with environmental dropout - EnvDrop) |
-
-
Metrics
-
Visual Navigation
-
Categories