Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models
A curated list of multimodal dialogue models resources.
https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models
List: awesome-of-multimodal-dialogue-models
Last synced: 12 days ago
JSON representation
A curated list of multimodal dialogue models resources.
- Host: GitHub
- URL: https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models
- Owner: zzw-zwzhang
- Created: 2023-06-12T11:07:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-06T10:24:54.000Z (over 1 year ago)
- Last Synced: 2025-02-04T06:01:37.228Z (17 days ago)
- Size: 102 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-of-multimodal-dialogue-models - A curated list of multimodal dialogue models resources. (Other Lists / Julia Lists)
README
# Awesome-of-Multimodal-Dialogue-Models [](https://github.com/sindresorhus/awesome)

A curated list of multimodal dialogue models and related resources.
Please feel free to pull requests or open an issue to add papers.
### :high_brightness: Updated 2023-07-07
---
## Table of Contents
- [Type of Multimodal Dialogue Models](#type-of-multimodal-dialogue-models)
- [Multimodal Dialogue Models](#Multimodal-Dialogue-Models)
- [2021 Venues](#2023)
- [2020 Venues](#2022)
- [2019 Venues](#2021)
- [2018 Venues](#2020)
- [2017 Venues](#2019)
- [2016 Venues](#2018)
- [Previous Venues](#2010-2017)
- [arXiv](#arxiv)
- [Awesome Surveys](#awesome-surveys)- [Awesome Blogs](#awesome-blogs)
- [Multimodal Datasets](#Multimodal-Datasets)
### Type of Multimodal Dialogue Models
| Type | `UIUO` | `MIUO` | `MIMO` | `L2V` | `V2L` | `Other` |
|:----------- |:------------------------------------:|:--------------------------------------:|:----------------------------------------:|:----------------------:|:---------------------:|:---------------:|
| Explanation | Unimodal Input & Unimodal Output | Multimodal Input & Unimodal Output | Multimodal Input & Multimodal Output | Language to Vision | Vision to Language | other types |### arXiv
| Title | Date | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?](https://arxiv.org/pdf/2307.02469.pdf) | 2023.07.05 | `MIUO` | [PyTorch(Author)]() |  |
[SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs](https://arxiv.org/pdf/2306.17842.pdf) | 2023.07.03 | `MIUO` | [PyTorch(Author)]() |  |
[LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding](https://arxiv.org/pdf/2306.17107.pdf) | 2023.06.29 | `MIUO` | [PyTorch(Author)](https://github.com/SALT-NLP/LLaVAR) |  |
[AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn](https://arxiv.org/pdf/2306.08640.pdf) | 2023.06.28 | `MIUO` | [PyTorch(Author)](https://github.com/showlab/assistgpt) |  |
[KOSMOS-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/pdf/2306.14824.pdf) | 2023.06.27 | `MIUO` | [PyTorch(Author)](https://github.com/microsoft/unilm/tree/master/kosmos-2) |  |
[Aligning Large Multi-Modal Model with Robust Instruction Tuning](https://arxiv.org/pdf/2306.14565.pdf) | 2023.06.26 | `MIUO` | [PyTorch(Author)]() |  |
[Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration](https://arxiv.org/pdf/2306.09093.pdf) | 2023.06.15 | `MIUO` | [PyTorch(Author)](https://github.com/lyuchenyang/Macaw-LLM) |  |
[Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation](https://arxiv.org/pdf/2303.05983.pdf) | 2023.06.14 | `MIMO` | [PyTorch(Author)](https://github.com/matrix-alpha/Accountable-Textual-Visual-Chat) |  |
[Grounding Language Models to Images for Multimodal Inputs and Outputs](http://arxiv.org/abs/2301.13823.pdf) | 2023.06.13 | `MIMO` | [PyTorch(Author)](https://github.com/kohjingyu/fromage) |  |
[MultiModal-GPT: A Vision and Language Model for Dialogue with Humans](https://arxiv.org/pdf/2305.04790.pdf) | 2023.06.13 | `MIUO` | [PyTorch(Author)](https://github.com/open-mmlab/Multimodal-GPT) |  |
[MIMIC-IT: Multi-Modal In-Context Instruction Tuning](https://arxiv.org/pdf/2306.05425.pdf) | 2023.06.08 | `MIUO` | [PyTorch(Author)](https://github.com/Luodian/otter) |  |
[Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models](https://arxiv.org/abs//2306.05424.pdf) | 2023.06.08 | `MIUO` | [PyTorch(Author)](https://github.com/mbzuai-oryx/Video-ChatGPT) |  |
[GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction](https://arxiv.org/abs/2305.18752.pdf) | 2023.05.30 | `MIMO` | [PyTorch(Author)](https://github.com/StevenGrove/GPT4Tools) |  |
[Controllable Text-to-Image Generation with GPT-4](https://arxiv.org/abs/2305.18583.pdf) | 2023.05.29 | `MIUO` | [PyTorch(Author)](https://github.com/tianjunz/Control-GPT) |  |
[Mindstorms in Natural Language-Based Societies of Mind]() | 2023.05.26 | `MIMO` | [PyTorch(Author)]() |  |
[Generating Images with Multimodal Language Models](https://arxiv.org/pdf/2305.17216.pdf) | 2023.05.26 | `MIMO` | [PyTorch(Author)](https://github.com/kohjingyu/gill) |  |
[HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](https://arxiv.org/pdf/2303.17580.pdf) | 2023.05.25 | `MIMO` | [PyTorch(Author)](https://github.com/microsoft/JARVIS) |  |
[ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst](https://arxiv.org/abs/2305.16103.pdf) | 2023.05.25 | `MIUO` | [PyTorch(Author)](https://github.com/joez17/ChatBridge) |  |
[EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought](https://arxiv.org/abs/2305.15021.pdf) | 2023.05.24 | `MIUO` | [PyTorch(Author)](https://github.com/EmbodiedGPT/EmbodiedGPT_Pytorch) |  |
[LMEye: An Interactive Perception Network for Large Language Models](https://arxiv.org/pdf/2305.03701.pdf) | 2023.05.19 | `MIUO` | [PyTorch(Author)](https://github.com/YunxinLi/LingCloud) |  |
[Visual Instruction Tuning](https://arxiv.org/abs/2304.08485.pdf) | 2023.05.17 | `MIUO` | [PyTorch(Author)](https://github.com/haotian-liu/LLaVA) |  |
[VideoChat: Chat-Centric Video Understanding](https://arxiv.org/abs/2305.06355.pdf) | 2023.05.10 | `MIUO` | [PyTorch(Author)](https://github.com/OpenGVLab/Ask-Anything) |  |
[Caption Anything: Interactive Image Description with Diverse Multimodal Controls](https://arxiv.org/abs/2305.02677.pdf) | 2023.05.08 | `MIUO` | [PyTorch(Author)](https://github.com/ttengwang/Caption-Anything) |  |
[MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models](https://arxiv.org/abs/2304.10592.pdf) | 2023.04.20 | `MIUO` | [PyTorch(Author)](https://github.com/Vision-CAIR/MiniGPT-4) |  |
[TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs](https://arxiv.org/pdf/2303.16434.pdf) | 2023.03.29 | `MIUO` | [PyTorch(Author)](https://github.com/microsoft/TaskMatrix/tree/main/TaskMatrix.AI) |  |
[GPT-4 Technical Report](https://arxiv.org/abs/2303.08774.pdf) | 2023.03.27 | `MIMO` | [PyTorch(Author)]() |  |
[PandaGPT: One Model To Instruction-Follow Them All](https://arxiv.org/abs/2305.16355.pdf) | 2023.03.25 | `MIUO` | [PyTorch(Author)](https://github.com/yxuansu/PandaGPT) |  |
[Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models](https://arxiv.org/pdf/2303.04671.pdf) | 2023.03.08 | `MIMO` | [PyTorch(Author)](https://github.com/microsoft/TaskMatrix) |  |
[MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action](https://arxiv.org/abs/2303.11381.pdf) | 2023.03.20 | `MIUO` | [PyTorch(Author)](https://github.com/microsoft/MM-REACT) |  |
[]() | 2023.06.13 | `MIUO` | [PyTorch(Author)]() |  |
[]() | 2023.06.13 | `MIUO` | [PyTorch(Author)]() |  |### 2023
| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[InstructPix2Pix Learning to Follow Image Editing Instructions](https://arxiv.org/abs/2211.09800.pdf) | CVPR-Highlight | `MIUO` | [PyTorch(Author)](https://github.com/timothybrooks/instruct-pix2pix) |  |
[BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/pdf/2301.12597.pdf) | ICML | `V2L` | [PyTorch(Author)](https://github.com/salesforce/LAVIS/tree/main/projects/blip2) |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |### 2022
| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[Flamingo: a Visual Language Model for Few-Shot Learning](https://arxiv.org/abs/2204.14198.pdf) | NeurIPS | `MIUO` | [PyTorch(Author)](https://github.com/zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models) |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |### 2021
| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |### 2020
| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |### Previous Venues
| Title | Venue | Type | Code | Star |
|:-------- |:--------:|:--------:|:--------:|:--------:|
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |
[]() | NeurIPS | `UIUO` | [PyTorch(Author)]() |  |## Awesome Surveys
- [Awesome-Multimodal-Large-Language-Models](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models) 
- [awesome-multimodal-ml](https://github.com/pliang279/awesome-multimodal-ml) 
- [Awesome-Multimodal-Research](https://github.com/Eurus-Holmes/Awesome-Multimodal-Research) 
- [Awesome-Text-to-Image](https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image) 
- [Awesome-Multimodal-LLM](https://github.com/HenryHZY/Awesome-Multimodal-LLM) 
- [awesome-llm-and-aigc](https://github.com/sjinzh/awesome-llm-and-aigc) 
- [Awesome-Multimodal-Chatbot](https://github.com/zjr2000/Awesome-Multimodal-Chatbot) 
- [Awesome-Multimodal-LLM](https://github.com/vincentlux/Awesome-Multimodal-LLM) 
- [awesome-free-chatgpt](https://github.com/LiLittleCat/awesome-free-chatgpt) 
- [awesome-generative-ai](https://github.com/steven2358/awesome-generative-ai) 
- [Awesome-LLM](https://github.com/Hannibal046/Awesome-LLM) 
- [awesome-vision-and-language](https://github.com/sangminwoo/awesome-vision-and-language) 
- [awesome-vision-language-pretraining-papers](https://github.com/yuewang-cuhk/awesome-vision-language-pretraining-papers) 
- [awesome-chatgpt-dataset](https://github.com/voidful/awesome-chatgpt-dataset) 
- []() ## Awesome Blogs
- []() []()## Awesome Multimodal Datasets
- []() []()