Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Awesome-Multi-Modal-Dialog

[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
https://github.com/ImKeTT/Awesome-Multi-Modal-Dialog

Last synced: about 16 hours ago
JSON representation

Datasets
Methods
- Visual Grounded Dialogue
  - Visual Dialog
  - Open Domain Dialogue Generation with Latent Images - Chat; Reddit | AAAI 2021 | CODE | FB |
  - Maria: A Visual Experience Powered Conversational Agent
  - Learning to Ground Visual Objects for Visual Dialog - COCO | ACL 2021 | CODE | FB |
  - Multi-Modal Open-Domain Dialogue - Chat; ConvAI2; EmpatheticDialogues; Wizard of WikiPedia; BlendedSkillTalk | EMNLP 2021 | CODE | FB |
  - Iterative Context-Aware Graph Inference for Visual Dialog
  - Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer - VisDial) | FB |
  - VD-BERT: A Unified Vision and Dialog Transformer with BERT - BERT) | FB |
  - GuessWhat?! Visual object discovery through multi-modal dialogue
  - Ask No More: Deciding when to guess in referential visual dialogue - unitn-uva.github.io) | FB |
  - Visual Reference Resolution using Attention Memory for Visual Dialog
  - Visual Coreference Resolution in Visual Dialog using Neural Module Networks
  - Dual Attention Networks for Visual Reference Resolution in Visual Dialog - VisDial) | AB |
  - Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs (LTMI)
  - Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline - bert) | AB |
  - Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog - COCO | ACL 2019 | CODE | AB |
  - Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
  - Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
  - Multi-View Attention Network for Visual Dialog - mlp-lab/visdial-challenge-starter-pytorch) | AB |
  - Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning - mlp-lab/visdial-rl) | FB |
  - Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
  - Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat - unitn-uva.github.io) | FB |
- Multi-modal Conversation
  - Multimodal Dialogue Response Generation
  - Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images - COCO; Flicker30K | ACL 2021 | [CODE](https://github.com/shh1574/multi-modal-dialogue-dataset) | FB |
  - Towards Enriching Responses with Crowd-sourced Knowledge for Task-oriented Dialogue
  - Multimodal Dialog System: Generating Responses via Adaptive Decoders
  - Multimodal Dialog Systems via Capturing Context-aware Dependencies of Semantic Elements
  - Text is NOT Enough: Integrating Visual Impressions into Open-domain Dialogue Generation
- Question Generation
- Visual Navigation
  - Learning to interpret natural language navigation instructions from observations - 232.pdf) | AAAI 2011 | CODE |
  - Talk the walk: Navigating new york city through grounded dialogue
  - Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction - lab/drif) |
  - Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments - research/google-research/tree/master/r4r); VQA; [Matterport3D](https://arxiv.org/abs/1709.06158) | CVPR 2018 | [CODE](https://github.com/peteanderson80/Matterport3DSimulator) |
  - Embodied Question Answering
  - IQA: Visual Question Answering in Interactive Environments - THOR | CVPR 2018 | [CODE](https://github.com/danielgordon10/thor-iqa-cvpr-2018) |
  - Touchdown: Natural language navigation and spatial reasoning in visual street environments - lab/touchdown) |
  - Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation - research/google-research/tree/master/r4r) | ACL 2019 | CODE |
  - Learning to navigate unseen environments: Back translation with environmental dropout - EnvDrop) |
Metrics
- Visual Navigation
  - A Revised Generative Evaluation of Visual Dialogue

Categories

Methods 42 Datasets 10 Metrics 1

Sub Categories

Visual Grounded Dialogue 22 Visual Navigation 10 Multi-modal Conversation 6 Question Generation 5