Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-of-multimodal-dialogue-models

A curated list of multimodal dialogue models resources.
https://github.com/zzw-zwzhang/awesome-of-multimodal-dialogue-models

Last synced: about 6 hours ago
JSON representation

Table of Contents
- arXiv
  - SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
  - Controllable Text-to-Image Generation with GPT-4 - GPT) | ![Github stars](https://img.shields.io/github/stars/tianjunz/Control-GPT.svg) |
  - ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
  - EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
  - Visual Instruction Tuning - liu/LLaVA) | ![Github stars](https://img.shields.io/github/stars/haotian-liu/LLaVA.svg) |
  - VideoChat: Chat-Centric Video Understanding - Anything) | ![Github stars](https://img.shields.io/github/stars/OpenGVLab/Ask-Anything.svg) |
  - Grounding Language Models to Images for Multimodal Inputs and Outputs
  - Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT) | ![Github stars](https://img.shields.io/github/stars/mbzuai-oryx/Video-ChatGPT.svg) |
  - GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
  - Caption Anything: Interactive Image Description with Diverse Multimodal Controls - Anything) | ![Github stars](https://img.shields.io/github/stars/ttengwang/Caption-Anything.svg) |
  - MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/MiniGPT-4) | ![Github stars](https://img.shields.io/github/stars/Vision-CAIR/MiniGPT-4.svg) |
  - MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action - REACT) | ![Github stars](https://img.shields.io/github/stars/microsoft/MM-REACT.svg) |
  - GPT-4 Technical Report
  - PandaGPT: One Model To Instruction-Follow Them All
  - What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
  - SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
  - LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding - NLP/LLaVAR) | ![Github stars](https://img.shields.io/github/stars/SALT-NLP/LLaVAR.svg) |
  - AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
  - KOSMOS-2: Grounding Multimodal Large Language Models to the World - 2) | ![Github stars](https://img.shields.io/github/stars/microsoft/unilm.svg) |
  - Aligning Large Multi-Modal Model with Robust Instruction Tuning
  - Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - LLM) | ![Github stars](https://img.shields.io/github/stars/lyuchenyang/Macaw-LLM.svg) |
  - Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation - alpha/Accountable-Textual-Visual-Chat) | ![Github stars](https://img.shields.io/github/stars/matrix-alpha/Accountable-Textual-Visual-Chat.svg) |
  - LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding - NLP/LLaVAR) | ![Github stars](https://img.shields.io/github/stars/SALT-NLP/LLaVAR.svg) |
  - AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
  - KOSMOS-2: Grounding Multimodal Large Language Models to the World - 2) | ![Github stars](https://img.shields.io/github/stars/microsoft/unilm.svg) |
  - Aligning Large Multi-Modal Model with Robust Instruction Tuning
  - Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - LLM) | ![Github stars](https://img.shields.io/github/stars/lyuchenyang/Macaw-LLM.svg) |
  - Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation - alpha/Accountable-Textual-Visual-Chat) | ![Github stars](https://img.shields.io/github/stars/matrix-alpha/Accountable-Textual-Visual-Chat.svg) |
  - MultiModal-GPT: A Vision and Language Model for Dialogue with Humans - mmlab/Multimodal-GPT) | ![Github stars](https://img.shields.io/github/stars/open-mmlab/Multimodal-GPT.svg) |
  - MIMIC-IT: Multi-Modal In-Context Instruction Tuning
  - Generating Images with Multimodal Language Models
  - HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
  - LMEye: An Interactive Perception Network for Large Language Models
  - TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
  - Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
- 2023
  - InstructPix2Pix Learning to Follow Image Editing Instructions - Highlight | `MIUO` | [PyTorch(Author)](https://github.com/timothybrooks/instruct-pix2pix) | ![Github stars](https://img.shields.io/github/stars/timothybrooks/instruct-pix2pix.svg) |
  - BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- 2022
  - Flamingo: a Visual Language Model for Few-Shot Learning - zwzhang/Awesome-of-Multimodal-Dialogue-Models) | ![Github stars](https://img.shields.io/github/stars/zzw-zwzhang/Awesome-of-Multimodal-Dialogue-Models.svg) |
Awesome Surveys
- Previous Venues
  - Awesome-Multimodal-Large-Language-Models - Multimodal-Large-Language-Models.svg)
  - awesome-multimodal-ml - multimodal-ml.svg)
  - Awesome-Multimodal-Research - Holmes/Awesome-Multimodal-Research.svg)
  - Awesome-Text-to-Image - Zhou-cv/Awesome-Text-to-Image.svg)
  - Awesome-Multimodal-LLM - Multimodal-LLM.svg)
  - Awesome-Multimodal-Chatbot - Multimodal-Chatbot.svg)
  - Awesome-Multimodal-LLM - Multimodal-LLM.svg)
  - awesome-free-chatgpt - free-chatgpt.svg)
  - awesome-generative-ai - generative-ai.svg)
  - Awesome-LLM - LLM.svg)
  - awesome-vision-and-language - vision-and-language.svg)
  - awesome-vision-language-pretraining-papers - cuhk/awesome-vision-language-pretraining-papers.svg)
  - awesome-chatgpt-dataset - chatgpt-dataset.svg)

Programming Languages

Categories

Table of Contents 38 Awesome Surveys 13

Sub Categories

arXiv 35 Previous Venues 13 2023 2 2022 1

Keywords

awesome 6 awesome-list 4 multimodal-learning 3 large-language-models 3 multimodal 3 machine-learning 2 instruction-following 2 instruction-tuning 2 vision-and-language 2 multimodal-deep-learning 2 natural-language-processing 2 computer-vision 2 multimodal-large-language-models 2 chatgpt 2 reading-list 1 reinforcement-learning 1 representation-learning 1 robotics 1 speech-processing 1 multimodal-research 1 awseome-list 1 generative-adversarial-network 1 healthcare 1 deep-learning 1 visual-instruction-tuning 1 visual-in-context-learning 1 visual-chain-of-thought 1 multimodal-instruction-tuning 1 multimodal-in-context-learning 1 multimodal-chain-of-thought 1 multi-modality 1 large-vision-language-models 1 large-vision-language-model 1 in-context-learning 1 instructions 1 gpt4 1 dataset 1 vl-ptms 1 pretraining 1 bert 1 llm 1 generative-art 1 generative-ai 1 artificial-intelligence 1 ai 1 freechatgpt 1 free 1 chat 1 vision-language-model 1 paper-list 1