Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models

A curated list of multimodal dialogue models resources.
https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models

List: awesome-of-multimodal-dialogue-models

Last synced: 12 days ago
JSON representation

A curated list of multimodal dialogue models resources.

Awesome Lists containing this project

README

        

# Awesome-of-Multimodal-Dialogue-Models [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

![](https://img.shields.io/badge/Number-60-green)

A curated list of multimodal dialogue models and related resources.

Please feel free to pull requests or open an issue to add papers.

### :high_brightness: Updated 2023-07-07

---

## Table of Contents

- [Type of Multimodal Dialogue Models](#type-of-multimodal-dialogue-models)

- [Multimodal Dialogue Models](#Multimodal-Dialogue-Models)
- [2021 Venues](#2023)
- [2020 Venues](#2022)
- [2019 Venues](#2021)
- [2018 Venues](#2020)
- [2017 Venues](#2019)
- [2016 Venues](#2018)
- [Previous Venues](#2010-2017)
- [arXiv](#arxiv)

- [Awesome Surveys](#awesome-surveys)

- [Awesome Blogs](#awesome-blogs)

- [Multimodal Datasets](#Multimodal-Datasets)

### Type of Multimodal Dialogue Models

| Type | `UIUO` | `MIUO` | `MIMO` | `L2V` | `V2L` | `Other` |
|:----------- |:------------------------------------:|:--------------------------------------:|:----------------------------------------:|:----------------------:|:---------------------:|:---------------:|
| Explanation | Unimodal Input & Unimodal Output | Multimodal Input & Unimodal Output | Multimodal Input & Multimodal Output | Language to Vision | Vision to Language | other types |

### arXiv

| Title | Date | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?](https://arxiv.org/pdf/2307.02469.pdf) | 2023.07.05 | `MIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/.svg) |
[SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs](https://arxiv.org/pdf/2306.17842.pdf) | 2023.07.03 | `MIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/.svg) |
[LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding](https://arxiv.org/pdf/2306.17107.pdf) | 2023.06.29 | `MIUO` | [PyTorch(Author)](https://github.com/SALT-NLP/LLaVAR) | ![Github stars](https://img.shields.io/github/stars/SALT-NLP/LLaVAR.svg) |
[AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn](https://arxiv.org/pdf/2306.08640.pdf) | 2023.06.28 | `MIUO` | [PyTorch(Author)](https://github.com/showlab/assistgpt) | ![Github stars](https://img.shields.io/github/stars/showlab/assistgpt.svg) |
[KOSMOS-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/pdf/2306.14824.pdf) | 2023.06.27 | `MIUO` | [PyTorch(Author)](https://github.com/microsoft/unilm/tree/master/kosmos-2) | ![Github stars](https://img.shields.io/github/stars/microsoft/unilm.svg) |
[Aligning Large Multi-Modal Model with Robust Instruction Tuning](https://arxiv.org/pdf/2306.14565.pdf) | 2023.06.26 | `MIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/.svg) |
[Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration](https://arxiv.org/pdf/2306.09093.pdf) | 2023.06.15 | `MIUO` | [PyTorch(Author)](https://github.com/lyuchenyang/Macaw-LLM) | ![Github stars](https://img.shields.io/github/stars/lyuchenyang/Macaw-LLM.svg) |
[Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation](https://arxiv.org/pdf/2303.05983.pdf) | 2023.06.14 | `MIMO` | [PyTorch(Author)](https://github.com/matrix-alpha/Accountable-Textual-Visual-Chat) | ![Github stars](https://img.shields.io/github/stars/matrix-alpha/Accountable-Textual-Visual-Chat.svg) |
[Grounding Language Models to Images for Multimodal Inputs and Outputs](http://arxiv.org/abs/2301.13823.pdf) | 2023.06.13 | `MIMO` | [PyTorch(Author)](https://github.com/kohjingyu/fromage) | ![Github stars](https://img.shields.io/github/stars/kohjingyu/fromage.svg) |
[MultiModal-GPT: A Vision and Language Model for Dialogue with Humans](https://arxiv.org/pdf/2305.04790.pdf) | 2023.06.13 | `MIUO` | [PyTorch(Author)](https://github.com/open-mmlab/Multimodal-GPT) | ![Github stars](https://img.shields.io/github/stars/open-mmlab/Multimodal-GPT.svg) |
[MIMIC-IT: Multi-Modal In-Context Instruction Tuning](https://arxiv.org/pdf/2306.05425.pdf) | 2023.06.08 | `MIUO` | [PyTorch(Author)](https://github.com/Luodian/otter) | ![Github stars](https://img.shields.io/github/stars/Luodian/otter.svg) |
[Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models](https://arxiv.org/abs//2306.05424.pdf) | 2023.06.08 | `MIUO` | [PyTorch(Author)](https://github.com/mbzuai-oryx/Video-ChatGPT) | ![Github stars](https://img.shields.io/github/stars/mbzuai-oryx/Video-ChatGPT.svg) |
[GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction](https://arxiv.org/abs/2305.18752.pdf) | 2023.05.30 | `MIMO` | [PyTorch(Author)](https://github.com/StevenGrove/GPT4Tools) | ![Github stars](https://img.shields.io/github/stars/StevenGrove/GPT4Tools.svg) |
[Controllable Text-to-Image Generation with GPT-4](https://arxiv.org/abs/2305.18583.pdf) | 2023.05.29 | `MIUO` | [PyTorch(Author)](https://github.com/tianjunz/Control-GPT) | ![Github stars](https://img.shields.io/github/stars/tianjunz/Control-GPT.svg) |
[Mindstorms in Natural Language-Based Societies of Mind]() | 2023.05.26 | `MIMO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/.svg) |
[Generating Images with Multimodal Language Models](https://arxiv.org/pdf/2305.17216.pdf) | 2023.05.26 | `MIMO` | [PyTorch(Author)](https://github.com/kohjingyu/gill) | ![Github stars](https://img.shields.io/github/stars/kohjingyu/gill.svg) |
[HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](https://arxiv.org/pdf/2303.17580.pdf) | 2023.05.25 | `MIMO` | [PyTorch(Author)](https://github.com/microsoft/JARVIS) | ![Github stars](https://img.shields.io/github/stars/microsoft/JARVIS.svg) |
[ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst](https://arxiv.org/abs/2305.16103.pdf) | 2023.05.25 | `MIUO` | [PyTorch(Author)](https://github.com/joez17/ChatBridge) | ![Github stars](https://img.shields.io/github/stars/joez17/ChatBridge.svg) |
[EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought](https://arxiv.org/abs/2305.15021.pdf) | 2023.05.24 | `MIUO` | [PyTorch(Author)](https://github.com/EmbodiedGPT/EmbodiedGPT_Pytorch) | ![Github stars](https://img.shields.io/github/stars/EmbodiedGPT/EmbodiedGPT_Pytorch.svg) |
[LMEye: An Interactive Perception Network for Large Language Models](https://arxiv.org/pdf/2305.03701.pdf) | 2023.05.19 | `MIUO` | [PyTorch(Author)](https://github.com/YunxinLi/LingCloud) | ![Github stars](https://img.shields.io/github/stars/YunxinLi/LingCloud.svg) |
[Visual Instruction Tuning](https://arxiv.org/abs/2304.08485.pdf) | 2023.05.17 | `MIUO` | [PyTorch(Author)](https://github.com/haotian-liu/LLaVA) | ![Github stars](https://img.shields.io/github/stars/haotian-liu/LLaVA.svg) |
[VideoChat: Chat-Centric Video Understanding](https://arxiv.org/abs/2305.06355.pdf) | 2023.05.10 | `MIUO` | [PyTorch(Author)](https://github.com/OpenGVLab/Ask-Anything) | ![Github stars](https://img.shields.io/github/stars/OpenGVLab/Ask-Anything.svg) |
[Caption Anything: Interactive Image Description with Diverse Multimodal Controls](https://arxiv.org/abs/2305.02677.pdf) | 2023.05.08 | `MIUO` | [PyTorch(Author)](https://github.com/ttengwang/Caption-Anything) | ![Github stars](https://img.shields.io/github/stars/ttengwang/Caption-Anything.svg) |
[MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592.pdf) | 2023.04.20 | `MIUO` | [PyTorch(Author)](https://github.com/Vision-CAIR/MiniGPT-4) | ![Github stars](https://img.shields.io/github/stars/Vision-CAIR/MiniGPT-4.svg) |
[TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs](https://arxiv.org/pdf/2303.16434.pdf) | 2023.03.29 | `MIUO` | [PyTorch(Author)](https://github.com/microsoft/TaskMatrix/tree/main/TaskMatrix.AI) | ![Github stars](https://img.shields.io/github/stars/microsoft/TaskMatrix.svg) |
[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774.pdf) | 2023.03.27 | `MIMO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/.svg) |
[PandaGPT: One Model To Instruction-Follow Them All](https://arxiv.org/abs/2305.16355.pdf) | 2023.03.25 | `MIUO` | [PyTorch(Author)](https://github.com/yxuansu/PandaGPT) | ![Github stars](https://img.shields.io/github/stars/yxuansu/PandaGPT.svg) |
[Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models](https://arxiv.org/pdf/2303.04671.pdf) | 2023.03.08 | `MIMO` | [PyTorch(Author)](https://github.com/microsoft/TaskMatrix) | ![Github stars](https://img.shields.io/github/stars/microsoft/TaskMatrix.svg) |
[MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action](https://arxiv.org/abs/2303.11381.pdf) | 2023.03.20 | `MIUO` | [PyTorch(Author)](https://github.com/microsoft/MM-REACT) | ![Github stars](https://img.shields.io/github/stars/microsoft/MM-REACT.svg) |
[]() | 2023.06.13 | `MIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/.svg) |
[]() | 2023.06.13 | `MIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/.svg) |

### 2023

| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[InstructPix2Pix Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800.pdf) | CVPR-Highlight | `MIUO` | [PyTorch(Author)](https://github.com/timothybrooks/instruct-pix2pix) | ![Github stars](https://img.shields.io/github/stars/timothybrooks/instruct-pix2pix.svg) |
[BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/pdf/2301.12597.pdf) | ICML | `V2L` | [PyTorch(Author)](https://github.com/salesforce/LAVIS/tree/main/projects/blip2) | ![Github stars](https://img.shields.io/github/stars/salesforce/LAVIS.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |

### 2022

| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198.pdf) | NeurIPS | `MIUO` | [PyTorch(Author)](https://github.com/zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models) | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |

### 2021

| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |

### 2020

| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |

### Previous Venues

| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg) |

## Awesome Surveys
- [Awesome-Multimodal-Large-Language-Models](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models) ![Github stars](https://img.shields.io/github/stars/BradyFU/Awesome-Multimodal-Large-Language-Models.svg)
- [awesome-multimodal-ml](https://github.com/pliang279/awesome-multimodal-ml) ![Github stars](https://img.shields.io/github/stars/pliang279/awesome-multimodal-ml.svg)
- [Awesome-Multimodal-Research](https://github.com/Eurus-Holmes/Awesome-Multimodal-Research) ![Github stars](https://img.shields.io/github/stars/Eurus-Holmes/Awesome-Multimodal-Research.svg)
- [Awesome-Text-to-Image](https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image) ![Github stars](https://img.shields.io/github/stars/Yutong-Zhou-cv/Awesome-Text-to-Image.svg)
- [Awesome-Multimodal-LLM](https://github.com/HenryHZY/Awesome-Multimodal-LLM) ![Github stars](https://img.shields.io/github/stars/HenryHZY/Awesome-Multimodal-LLM.svg)
- [awesome-llm-and-aigc](https://github.com/sjinzh/awesome-llm-and-aigc) ![Github stars](https://img.shields.io/github/stars/sjinzh/awesome-llm-and-aigc.svg)
- [Awesome-Multimodal-Chatbot](https://github.com/zjr2000/Awesome-Multimodal-Chatbot) ![Github stars](https://img.shields.io/github/stars/zjr2000/Awesome-Multimodal-Chatbot.svg)
- [Awesome-Multimodal-LLM](https://github.com/vincentlux/Awesome-Multimodal-LLM) ![Github stars](https://img.shields.io/github/stars/vincentlux/Awesome-Multimodal-LLM.svg)
- [awesome-free-chatgpt](https://github.com/LiLittleCat/awesome-free-chatgpt) ![Github stars](https://img.shields.io/github/stars/LiLittleCat/awesome-free-chatgpt.svg)
- [awesome-generative-ai](https://github.com/steven2358/awesome-generative-ai) ![Github stars](https://img.shields.io/github/stars/steven2358/awesome-generative-ai.svg)
- [Awesome-LLM](https://github.com/Hannibal046/Awesome-LLM) ![Github stars](https://img.shields.io/github/stars/Hannibal046/Awesome-LLM.svg)
- [awesome-vision-and-language](https://github.com/sangminwoo/awesome-vision-and-language) ![Github stars](https://img.shields.io/github/stars/sangminwoo/awesome-vision-and-language.svg)
- [awesome-vision-language-pretraining-papers](https://github.com/yuewang-cuhk/awesome-vision-language-pretraining-papers) ![Github stars](https://img.shields.io/github/stars/yuewang-cuhk/awesome-vision-language-pretraining-papers.svg)
- [awesome-chatgpt-dataset](https://github.com/voidful/awesome-chatgpt-dataset) ![Github stars](https://img.shields.io/github/stars/voidful/awesome-chatgpt-dataset.svg)
- []() ![Github stars](https://img.shields.io/github/stars/.svg)

## Awesome Blogs
- []() []()

## Awesome Multimodal Datasets
- []() []()