Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models

A curated list of multimodal dialogue models resources.
https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models
List: awesome-of-multimodal-dialogue-models
Last synced: 12 days ago
JSON representation
A curated list of multimodal dialogue models resources.
Host: GitHub
URL: https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models
Owner: zzw-zwzhang
Created: 2023-06-12T11:07:15.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-07-06T10:24:54.000Z (over 1 year ago)
Last Synced: 2025-02-04T06:01:37.228Z (17 days ago)
Size: 102 KB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

ultimate-awesome - awesome-of-multimodal-dialogue-models - A curated list of multimodal dialogue models resources. (Other Lists / Julia Lists)
README

        # Awesome-of-Multimodal-Dialogue-Models [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

![](https://img.shields.io/badge/Number-60-green)

A curated list of multimodal dialogue models and related resources.

Please feel free to pull requests or open an issue to add papers.

### :high_brightness: Updated 2023-07-07

---

## Table of Contents

- [Type of Multimodal Dialogue Models](#type-of-multimodal-dialogue-models)

- [Multimodal Dialogue Models](#Multimodal-Dialogue-Models)

  - [2021 Venues](#2023)

  - [2020 Venues](#2022)

  - [2019 Venues](#2021)

  - [2018 Venues](#2020)

  - [2017 Venues](#2019)

  - [2016 Venues](#2018)

  - [Previous Venues](#2010-2017)

  - [arXiv](#arxiv)

 

- [Awesome Surveys](#awesome-surveys)

- [Awesome Blogs](#awesome-blogs)

- [Multimodal Datasets](#Multimodal-Datasets)

### Type of Multimodal Dialogue Models

| Type        | `UIUO`                               | `MIUO`                                 | `MIMO`                                   | `L2V`                  | `V2L`                 | `Other`         |

|:----------- |:------------------------------------:|:--------------------------------------:|:----------------------------------------:|:----------------------:|:---------------------:|:---------------:|

| Explanation |  Unimodal Input & Unimodal Output    |  Multimodal Input & Unimodal Output    |  Multimodal Input & Multimodal Output    |  Language to Vision    |  Vision to Language   |   other types   |

### arXiv

| Title    | Date     | Type     | Code     | Star     |

|:-------- |:--------:|:--------:|:--------:|:--------:|

[What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?](https://arxiv.org/pdf/2307.02469.pdf) | 2023.07.05 | `MIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/.svg)    |

[SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs](https://arxiv.org/pdf/2306.17842.pdf) | 2023.07.03 | `MIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/.svg)    |

[LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding](https://arxiv.org/pdf/2306.17107.pdf) | 2023.06.29 | `MIUO`     | [PyTorch(Author)](https://github.com/SALT-NLP/LLaVAR)   | ![Github stars](https://img.shields.io/github/stars/SALT-NLP/LLaVAR.svg)    |

[AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn](https://arxiv.org/pdf/2306.08640.pdf) | 2023.06.28 | `MIUO`     | [PyTorch(Author)](https://github.com/showlab/assistgpt)   | ![Github stars](https://img.shields.io/github/stars/showlab/assistgpt.svg)    |

[KOSMOS-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/pdf/2306.14824.pdf) | 2023.06.27 | `MIUO`     | [PyTorch(Author)](https://github.com/microsoft/unilm/tree/master/kosmos-2)   | ![Github stars](https://img.shields.io/github/stars/microsoft/unilm.svg)    |

[Aligning Large Multi-Modal Model with Robust Instruction Tuning](https://arxiv.org/pdf/2306.14565.pdf) | 2023.06.26 | `MIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/.svg)    |

[Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration](https://arxiv.org/pdf/2306.09093.pdf) | 2023.06.15 | `MIUO`     | [PyTorch(Author)](https://github.com/lyuchenyang/Macaw-LLM)   | ![Github stars](https://img.shields.io/github/stars/lyuchenyang/Macaw-LLM.svg)    |

[Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation](https://arxiv.org/pdf/2303.05983.pdf) | 2023.06.14 | `MIMO`     | [PyTorch(Author)](https://github.com/matrix-alpha/Accountable-Textual-Visual-Chat)   | ![Github stars](https://img.shields.io/github/stars/matrix-alpha/Accountable-Textual-Visual-Chat.svg)    |

[Grounding Language Models to Images for Multimodal Inputs and Outputs](http://arxiv.org/abs/2301.13823.pdf) | 2023.06.13 | `MIMO`     | [PyTorch(Author)](https://github.com/kohjingyu/fromage)   | ![Github stars](https://img.shields.io/github/stars/kohjingyu/fromage.svg)    |

[MultiModal-GPT: A Vision and Language Model for Dialogue with Humans](https://arxiv.org/pdf/2305.04790.pdf) | 2023.06.13 | `MIUO`     | [PyTorch(Author)](https://github.com/open-mmlab/Multimodal-GPT)   | ![Github stars](https://img.shields.io/github/stars/open-mmlab/Multimodal-GPT.svg)    |

[MIMIC-IT: Multi-Modal In-Context Instruction Tuning](https://arxiv.org/pdf/2306.05425.pdf) | 2023.06.08 | `MIUO`     | [PyTorch(Author)](https://github.com/Luodian/otter)   | ![Github stars](https://img.shields.io/github/stars/Luodian/otter.svg)    |

[Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models](https://arxiv.org/abs//2306.05424.pdf) | 2023.06.08 | `MIUO`     | [PyTorch(Author)](https://github.com/mbzuai-oryx/Video-ChatGPT)   | ![Github stars](https://img.shields.io/github/stars/mbzuai-oryx/Video-ChatGPT.svg)    |

[GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction](https://arxiv.org/abs/2305.18752.pdf) | 2023.05.30 | `MIMO`     | [PyTorch(Author)](https://github.com/StevenGrove/GPT4Tools)   | ![Github stars](https://img.shields.io/github/stars/StevenGrove/GPT4Tools.svg)    |

[Controllable Text-to-Image Generation with GPT-4](https://arxiv.org/abs/2305.18583.pdf) | 2023.05.29 | `MIUO`     | [PyTorch(Author)](https://github.com/tianjunz/Control-GPT)   | ![Github stars](https://img.shields.io/github/stars/tianjunz/Control-GPT.svg)    |

[Mindstorms in Natural Language-Based Societies of Mind]() | 2023.05.26 | `MIMO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/.svg)    |

[Generating Images with Multimodal Language Models](https://arxiv.org/pdf/2305.17216.pdf) | 2023.05.26 | `MIMO`     | [PyTorch(Author)](https://github.com/kohjingyu/gill)   | ![Github stars](https://img.shields.io/github/stars/kohjingyu/gill.svg)    |

[HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](https://arxiv.org/pdf/2303.17580.pdf) | 2023.05.25 | `MIMO`     | [PyTorch(Author)](https://github.com/microsoft/JARVIS)   | ![Github stars](https://img.shields.io/github/stars/microsoft/JARVIS.svg)    |

[ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst](https://arxiv.org/abs/2305.16103.pdf) | 2023.05.25 | `MIUO`     | [PyTorch(Author)](https://github.com/joez17/ChatBridge)   | ![Github stars](https://img.shields.io/github/stars/joez17/ChatBridge.svg)    |

[EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought](https://arxiv.org/abs/2305.15021.pdf) | 2023.05.24 | `MIUO`     | [PyTorch(Author)](https://github.com/EmbodiedGPT/EmbodiedGPT_Pytorch)   | ![Github stars](https://img.shields.io/github/stars/EmbodiedGPT/EmbodiedGPT_Pytorch.svg)    |

[LMEye: An Interactive Perception Network for Large Language Models](https://arxiv.org/pdf/2305.03701.pdf) | 2023.05.19 | `MIUO`     | [PyTorch(Author)](https://github.com/YunxinLi/LingCloud)   | ![Github stars](https://img.shields.io/github/stars/YunxinLi/LingCloud.svg)    |

[Visual Instruction Tuning](https://arxiv.org/abs/2304.08485.pdf) | 2023.05.17 | `MIUO`     | [PyTorch(Author)](https://github.com/haotian-liu/LLaVA)   | ![Github stars](https://img.shields.io/github/stars/haotian-liu/LLaVA.svg)    |

[VideoChat: Chat-Centric Video Understanding](https://arxiv.org/abs/2305.06355.pdf) | 2023.05.10 | `MIUO`     | [PyTorch(Author)](https://github.com/OpenGVLab/Ask-Anything)   | ![Github stars](https://img.shields.io/github/stars/OpenGVLab/Ask-Anything.svg)    |

[Caption Anything: Interactive Image Description with Diverse Multimodal Controls](https://arxiv.org/abs/2305.02677.pdf) | 2023.05.08 | `MIUO`     | [PyTorch(Author)](https://github.com/ttengwang/Caption-Anything)   | ![Github stars](https://img.shields.io/github/stars/ttengwang/Caption-Anything.svg)    |

[MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592.pdf) | 2023.04.20 | `MIUO`     | [PyTorch(Author)](https://github.com/Vision-CAIR/MiniGPT-4)   | ![Github stars](https://img.shields.io/github/stars/Vision-CAIR/MiniGPT-4.svg)    |

[TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs](https://arxiv.org/pdf/2303.16434.pdf) | 2023.03.29 | `MIUO`     | [PyTorch(Author)](https://github.com/microsoft/TaskMatrix/tree/main/TaskMatrix.AI)   | ![Github stars](https://img.shields.io/github/stars/microsoft/TaskMatrix.svg)    |

[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774.pdf) | 2023.03.27 | `MIMO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/.svg)    |

[PandaGPT: One Model To Instruction-Follow Them All](https://arxiv.org/abs/2305.16355.pdf) | 2023.03.25 | `MIUO`     | [PyTorch(Author)](https://github.com/yxuansu/PandaGPT)   | ![Github stars](https://img.shields.io/github/stars/yxuansu/PandaGPT.svg)    |

[Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models](https://arxiv.org/pdf/2303.04671.pdf) | 2023.03.08 | `MIMO`     | [PyTorch(Author)](https://github.com/microsoft/TaskMatrix)   | ![Github stars](https://img.shields.io/github/stars/microsoft/TaskMatrix.svg)    |

[MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action](https://arxiv.org/abs/2303.11381.pdf) | 2023.03.20 | `MIUO`     | [PyTorch(Author)](https://github.com/microsoft/MM-REACT)   | ![Github stars](https://img.shields.io/github/stars/microsoft/MM-REACT.svg)    |

[]() | 2023.06.13 | `MIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/.svg)    |

[]() | 2023.06.13 | `MIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/.svg)    |

### 2023

| Title    | Venue    | Type     | Code     | Star     |

|:-------- |:--------:|:--------:|:--------:|:--------:|

[InstructPix2Pix Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800.pdf) | CVPR-Highlight | `MIUO`     | [PyTorch(Author)](https://github.com/timothybrooks/instruct-pix2pix)   | ![Github stars](https://img.shields.io/github/stars/timothybrooks/instruct-pix2pix.svg)    |

[BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/pdf/2301.12597.pdf) | ICML | `V2L`     | [PyTorch(Author)](https://github.com/salesforce/LAVIS/tree/main/projects/blip2)   | ![Github stars](https://img.shields.io/github/stars/salesforce/LAVIS.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

### 2022

| Title    | Venue    | Type     | Code     | Star     |

|:-------- |:--------:|:--------:|:--------:|:--------:|

[Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198.pdf) | NeurIPS | `MIUO`     | [PyTorch(Author)](https://github.com/zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models)   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

### 2021

| Title    | Venue    | Type     | Code     | Star     |

|:-------- |:--------:|:--------:|:--------:|:--------:|

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

### 2020

| Title    | Venue    | Type     | Code     | Star     |

|:-------- |:--------:|:--------:|:--------:|:--------:|

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

### Previous Venues

| Title    | Venue    | Type     | Code     | Star     |

|:-------- |:--------:|:--------:|:--------:|:--------:|

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

[]() | NeurIPS | `UIUO`     | [PyTorch(Author)]()   | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Long-Tailed-Recognition.svg)    |

## Awesome Surveys

- [Awesome-Multimodal-Large-Language-Models](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models) ![Github stars](https://img.shields.io/github/stars/BradyFU/Awesome-Multimodal-Large-Language-Models.svg)

- [awesome-multimodal-ml](https://github.com/pliang279/awesome-multimodal-ml) ![Github stars](https://img.shields.io/github/stars/pliang279/awesome-multimodal-ml.svg)

- [Awesome-Multimodal-Research](https://github.com/Eurus-Holmes/Awesome-Multimodal-Research) ![Github stars](https://img.shields.io/github/stars/Eurus-Holmes/Awesome-Multimodal-Research.svg)

- [Awesome-Text-to-Image](https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image) ![Github stars](https://img.shields.io/github/stars/Yutong-Zhou-cv/Awesome-Text-to-Image.svg)

- [Awesome-Multimodal-LLM](https://github.com/HenryHZY/Awesome-Multimodal-LLM) ![Github stars](https://img.shields.io/github/stars/HenryHZY/Awesome-Multimodal-LLM.svg)

- [awesome-llm-and-aigc](https://github.com/sjinzh/awesome-llm-and-aigc) ![Github stars](https://img.shields.io/github/stars/sjinzh/awesome-llm-and-aigc.svg)

- [Awesome-Multimodal-Chatbot](https://github.com/zjr2000/Awesome-Multimodal-Chatbot) ![Github stars](https://img.shields.io/github/stars/zjr2000/Awesome-Multimodal-Chatbot.svg)

- [Awesome-Multimodal-LLM](https://github.com/vincentlux/Awesome-Multimodal-LLM) ![Github stars](https://img.shields.io/github/stars/vincentlux/Awesome-Multimodal-LLM.svg)

- [awesome-free-chatgpt](https://github.com/LiLittleCat/awesome-free-chatgpt) ![Github stars](https://img.shields.io/github/stars/LiLittleCat/awesome-free-chatgpt.svg)

- [awesome-generative-ai](https://github.com/steven2358/awesome-generative-ai) ![Github stars](https://img.shields.io/github/stars/steven2358/awesome-generative-ai.svg)

- [Awesome-LLM](https://github.com/Hannibal046/Awesome-LLM) ![Github stars](https://img.shields.io/github/stars/Hannibal046/Awesome-LLM.svg)

- [awesome-vision-and-language](https://github.com/sangminwoo/awesome-vision-and-language) ![Github stars](https://img.shields.io/github/stars/sangminwoo/awesome-vision-and-language.svg)

- [awesome-vision-language-pretraining-papers](https://github.com/yuewang-cuhk/awesome-vision-language-pretraining-papers) ![Github stars](https://img.shields.io/github/stars/yuewang-cuhk/awesome-vision-language-pretraining-papers.svg)

- [awesome-chatgpt-dataset](https://github.com/voidful/awesome-chatgpt-dataset) ![Github stars](https://img.shields.io/github/stars/voidful/awesome-chatgpt-dataset.svg)

- []() ![Github stars](https://img.shields.io/github/stars/.svg)

## Awesome Blogs

- []() []()

## Awesome Multimodal Datasets

- []() []()