Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-of-multimodal-dialogue-models
A curated list of multimodal dialogue models resources.
https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models
Last synced: about 6 hours ago
JSON representation
-
Table of Contents
-
arXiv
- SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
- Controllable Text-to-Image Generation with GPT-4 - GPT) | ![Github stars](https://img.shields.io/github/stars/tianjunz/Control-GPT.svg) |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
- Visual Instruction Tuning - liu/LLaVA) | ![Github stars](https://img.shields.io/github/stars/haotian-liu/LLaVA.svg) |
- VideoChat: Chat-Centric Video Understanding - Anything) | ![Github stars](https://img.shields.io/github/stars/OpenGVLab/Ask-Anything.svg) |
- Grounding Language Models to Images for Multimodal Inputs and Outputs
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT) | ![Github stars](https://img.shields.io/github/stars/mbzuai-oryx/Video-ChatGPT.svg) |
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
- Caption Anything: Interactive Image Description with Diverse Multimodal Controls - Anything) | ![Github stars](https://img.shields.io/github/stars/ttengwang/Caption-Anything.svg) |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/MiniGPT-4) | ![Github stars](https://img.shields.io/github/stars/Vision-CAIR/MiniGPT-4.svg) |
- MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action - REACT) | ![Github stars](https://img.shields.io/github/stars/microsoft/MM-REACT.svg) |
- GPT-4 Technical Report
- PandaGPT: One Model To Instruction-Follow Them All
- What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
- SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
- LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding - NLP/LLaVAR) | ![Github stars](https://img.shields.io/github/stars/SALT-NLP/LLaVAR.svg) |
- AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
- KOSMOS-2: Grounding Multimodal Large Language Models to the World - 2) | ![Github stars](https://img.shields.io/github/stars/microsoft/unilm.svg) |
- Aligning Large Multi-Modal Model with Robust Instruction Tuning
- Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - LLM) | ![Github stars](https://img.shields.io/github/stars/lyuchenyang/Macaw-LLM.svg) |
- Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation - alpha/Accountable-Textual-Visual-Chat) | ![Github stars](https://img.shields.io/github/stars/matrix-alpha/Accountable-Textual-Visual-Chat.svg) |
- LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding - NLP/LLaVAR) | ![Github stars](https://img.shields.io/github/stars/SALT-NLP/LLaVAR.svg) |
- AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
- KOSMOS-2: Grounding Multimodal Large Language Models to the World - 2) | ![Github stars](https://img.shields.io/github/stars/microsoft/unilm.svg) |
- Aligning Large Multi-Modal Model with Robust Instruction Tuning
- Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - LLM) | ![Github stars](https://img.shields.io/github/stars/lyuchenyang/Macaw-LLM.svg) |
- Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation - alpha/Accountable-Textual-Visual-Chat) | ![Github stars](https://img.shields.io/github/stars/matrix-alpha/Accountable-Textual-Visual-Chat.svg) |
- MultiModal-GPT: A Vision and Language Model for Dialogue with Humans - mmlab/Multimodal-GPT) | ![Github stars](https://img.shields.io/github/stars/open-mmlab/Multimodal-GPT.svg) |
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning
- Generating Images with Multimodal Language Models
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
- LMEye: An Interactive Perception Network for Large Language Models
- TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
- Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
-
2023
- InstructPix2Pix Learning to Follow Image Editing Instructions - Highlight | `MIUO` | [PyTorch(Author)](https://github.com/timothybrooks/instruct-pix2pix) | ![Github stars](https://img.shields.io/github/stars/timothybrooks/instruct-pix2pix.svg) |
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
-
2022
- Flamingo: a Visual Language Model for Few-Shot Learning - zwzhang/Awesome-of-Multimodal-Dialogue-Models) | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models.svg) |
-
-
Awesome Surveys
-
Previous Venues
- Awesome-Multimodal-Large-Language-Models - Multimodal-Large-Language-Models.svg)
- awesome-multimodal-ml - multimodal-ml.svg)
- Awesome-Multimodal-Research - Holmes/Awesome-Multimodal-Research.svg)
- Awesome-Text-to-Image - Zhou-cv/Awesome-Text-to-Image.svg)
- Awesome-Multimodal-LLM - Multimodal-LLM.svg)
- Awesome-Multimodal-Chatbot - Multimodal-Chatbot.svg)
- Awesome-Multimodal-LLM - Multimodal-LLM.svg)
- awesome-free-chatgpt - free-chatgpt.svg)
- awesome-generative-ai - generative-ai.svg)
- Awesome-LLM - LLM.svg)
- awesome-vision-and-language - vision-and-language.svg)
- awesome-vision-language-pretraining-papers - cuhk/awesome-vision-language-pretraining-papers.svg)
- awesome-chatgpt-dataset - chatgpt-dataset.svg)
-
Programming Languages
Categories
Sub Categories
Keywords
awesome
6
awesome-list
4
multimodal-learning
3
large-language-models
3
multimodal
3
machine-learning
2
instruction-following
2
instruction-tuning
2
vision-and-language
2
multimodal-deep-learning
2
natural-language-processing
2
computer-vision
2
multimodal-large-language-models
2
chatgpt
2
reading-list
1
reinforcement-learning
1
representation-learning
1
robotics
1
speech-processing
1
multimodal-research
1
awseome-list
1
generative-adversarial-network
1
healthcare
1
deep-learning
1
visual-instruction-tuning
1
visual-in-context-learning
1
visual-chain-of-thought
1
multimodal-instruction-tuning
1
multimodal-in-context-learning
1
multimodal-chain-of-thought
1
multi-modality
1
large-vision-language-models
1
large-vision-language-model
1
in-context-learning
1
instructions
1
gpt4
1
dataset
1
vl-ptms
1
pretraining
1
bert
1
llm
1
generative-art
1
generative-ai
1
artificial-intelligence
1
ai
1
freechatgpt
1
free
1
chat
1
vision-language-model
1
paper-list
1