Awesome-Multimodal-LLM

Reading list for Multimodal Large Language Models
https://github.com/vincentlux/Awesome-Multimodal-LLM

Last synced: 4 days ago
JSON representation

Table of Contents
- Recent Advances in Vision Foundation Models
- M3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
- LLaVA Instruction 150K - Instruct-150K)
- Youku-mPLUG 10M - PLUG/Youku-mPLUG)
- MULTIINSTRUCT: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning - NLP/MultiInstruct)
Survey Papers
- A Survey on Multimodal Large Language Models - Multimodal-Large-Language-Models)
- Vision-Language Models for Vision Tasks: A Survey
Core Areas
- Multimodal Understanding
- Vision-Centric Understanding
- Embodied-Centric Understanding
- Domain-Specific Models
  - LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med)
- Multimodal Evaluation

Categories

Core Areas 51 Table of Contents 5 Survey Papers 2

Sub Categories

Multimodal Understanding 20 Vision-Centric Understanding 17 Embodied-Centric Understanding 10 Multimodal Evaluation 3 Domain-Specific Models 1