Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
https://github.com/Atomic-man007/Awesome_Multimodel_LLM
Last synced: 3 days ago
JSON representation
-
Practical Guide for NLP Tasks
-
Efficiency
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Blog Post
- Article
- Paper
- Blog Post
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Article
- Article
- Paper
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
-
Generation tasks
-
Knowledge-intensive tasks
-
Traditional NLU tasks
-
Abilities with Scaling
-
Specific tasks
-
Real-World ''Tasks''
-
-
Prompting libraries & tools
- LlamaIndex
- YiVal - source GenAI-Ops tool for tuning and evaluating prompts, configurations, and model parameters using customizable datasets, evaluation methods, and improvement strategies.
- Semantic Kernel
- Prompttools - source Python tools for testing and evaluating models, vector DBs, and prompts.
- Promptify
- Weights & Biases
- OpenAI Evals - source library for evaluating task performance of language models and prompts.
- ModelFusion - A TypeScript library for building apps with LLMs and other ML models (speech-to-text, text-to-speech, image generation).
- Flappy - Ready LLM Agent SDK for Every Developer.
- FLAML (A Fast Library for Automated Machine Learning & Tuning)
- Guardrails.ai
- PromptPerfect
- Arthur Shield
- GPTRouter - GPTRouter is an open source LLM API Gateway that offers a universal API for 30+ LLMs, vision, and image models, with smart fallbacks based on uptime and latency, automatic retries, and streaming. Stay operational even when OpenAI is down
- Weights & Biases
- Outlines - specific language to simplify prompting and constrain generation.
- Chainlit
- LangChain
- Guidance
-
Trending LLM Projects
- llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- promptbase - All things prompt engineering.
- Devika
- anything-llm - A private ChatGPT to chat with anything!
- phi-2 - a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
- ollama - Get up and running with Llama 2 and other large language models locally.
-
Milestone Papers
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Language Models are Unsupervised Multitask Learners
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Language models are few-shot learners
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Evaluating Large Language Models Trained on Code
- Finetuned Language Models are Zero-Shot Learners
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- Solving Quantitative Reasoning Problems with Language Models
- Emergent Abilities of Large Language Models
- Language Models are General-Purpose Interfaces
- GLM-130B: An Open Bilingual Pre-trained Model
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- LLaMA: Open and Efficient Foundation Language Models
- Language Is Not All You Need: Aligning Perception with Language Models
- GPT-4 Technical Report
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
- RWKV: Reinventing RNNs for the Transformer Era
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- On the Opportunities and Risks of Foundation Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- OPT: Open Pre-trained Transformer Language Models
- Holistic Evaluation of Language Models
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Evaluating Large Language Models Trained on Code
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Language Models are General-Purpose Interfaces
- GLM-130B: An Open Bilingual Pre-trained Model
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Scaling Instruction-Finetuned Language Models
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- LaMDA: Language Models for Dialog Applications
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- Training language models to follow instructions with human feedback
- PaLM: Scaling Language Modeling with Pathways
- Improving alignment of dialogue agents via targeted human judgements
- PaLM 2 Technical Report
- Attention Is All You Need
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
-
Datasets of Pre-Training for Alignment
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Microsoft COCO: Common Objects in Context - Text |
- Im2Text: Describing Images Using 1 Million Captioned Photographs - Text |
- Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- MSR-VTT: A Large Video Description Dataset for Bridging Video and Language - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Microsoft COCO: Common Objects in Context - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
-
Tutorials about LLM
-
Open Source LLM
- Flan-Alpaca - Instruction Tuning from Humans and Machines.
- Baize - Baize is an open-source chat model trained with [LoRA](https://github.com/microsoft/LoRA). It uses 100k dialogs generated by letting ChatGPT chat with itself.
- Cabrita - A portuguese finetuned instruction LLaMA.
- Llama-X - Open Academic Research on Improving LLaMA to SOTA LLM.
- Chinese-Vicuna - A Chinese Instruction-following LLaMA-based Model.
- GPTQ-for-LLaMA - 4 bits quantization of [LLaMA](https://arxiv.org/abs/2302.13971) using [GPTQ](https://arxiv.org/abs/2210.17323).
- GPT4All - Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa.
- BELLE - Be Everyone's Large Language model Engine
- WizardLM|WizardCoder - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder.
- CaMA - a Chinese-English Bilingual LLaMA Model.
- BayLing - an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.
- UltraLM - Large-scale, Informative, and Diverse Multi-round Chat Models.
- Guanaco - QLoRA tuned LLaMA
- GLM - GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.
- ChatGLM2-6B - An Open Bilingual Chat LLM | 开源双语对话语言模型
- RWKV - Parallelizable RNN with Transformer-level LLM Performance.
- ChatRWKV - ChatRWKV is like ChatGPT but powered by my RWKV (100% RNN) language model.
- GPT-Neo - An implementation of model & data parallel [GPT3](https://arxiv.org/abs/2005.14165)-like models using the [mesh-tensorflow](https://github.com/tensorflow/mesh) library.
- Pythia - Interpreting Autoregressive Transformers Across Time and Scale
- OpenFlamingo - an open-source reproduction of DeepMind's Flamingo model.
- h2oGPT
- Open-Assistant - a project meant to give everyone access to a great chat based large language model.
- XGen - Salesforce open-source LLMs with 8k sequence length.
- Alpaca - A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. [Alpaca.cpp](https://github.com/antimatter15/alpaca.cpp) [Alpaca-LoRA](https://github.com/tloen/alpaca-lora)
- Vicuna - An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
- Koala - A Dialogue Model for Academic Research
- StackLLaMA - A hands-on guide to train LLaMA with RLHF.
- Orca - Microsoft's finetuned LLaMA model that reportedly matches GPT3.5, finetuned against 5M of data, ChatGPT, and GPT4
- BLOOM - BigScience Large Open-science Open-access Multilingual Language Model [BLOOM-LoRA](https://github.com/linhduongtuan/BLOOM-LORA)
- BLOOMZ&mT0 - a family of models capable of following human instructions in dozens of languages zero-shot.
- T5 - Text-to-Text Transfer Transformer
- OPT - Open Pre-trained Transformer Language Models.
- YaLM - a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
- Dolly - a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT.
- Dolly 2.0 - the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
- GALACTICA - The GALACTICA models are trained on a large-scale scientific corpus.
- GALPACA - GALACTICA 30B fine-tuned on the Alpaca dataset.
- Palmyra - Palmyra Base was primarily pre-trained with English text.
- Camel - a state-of-the-art instruction-following large language model designed to deliver exceptional performance and versatility.
- StarCoder - Hugging Face LLM for Code
- MPT-7B - Open LLM for commercial use by MosaicML
- Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
- LLaMA2 - A revolutionary version of llama , 70 - 13 - 7 -billion-parameter large language model. [LLaMA2](https://github.com/facebookresearch/llama) [HF - TheBloke/Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ)
- Falcon - Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
- baichuan-7B - baichuan-7B 是由百川智能开发的一个开源可商用的大规模预训练语言模型.
- HuggingChat - Powered by Open Assistant's latest model – the best open source chat model right now and @huggingface Inference API.
- ChatGLM-6B - ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型,基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构,具有 62 亿参数.
- LLaMA - A foundational, 65-billion-parameter large language model. [LLaMA.cpp](https://github.com/ggerganov/llama.cpp) [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)
- PanGu-α - PanGu-α is a 200B parameter autoregressive pretrained Chinese language model develped by Huawei Noah's Ark Lab, MindSpore Team and Peng Cheng Laboratory.
- UL2 - a unified framework for pretraining models that are universally effective across datasets and setups.
- Aquila - 悟道·天鹰语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
- StableLM - Stability AI Language Models.
- T0 - Multitask Prompted Training Enables Zero-Shot Task Generalization
- Phoenix
- MOSS - MOSS是一个支持中英双语和多种插件的开源对话语言模型.
-
LLM Training Frameworks
- Megatron-LM - Ongoing research training transformer models at scale.
- Colossal-AI - Making large AI models cheaper, faster, and more accessible.
- BMTrain - Efficient Training for Big Models.
- Mesh Tensorflow - Mesh TensorFlow: Model Parallelism Made Easier.
- maxtext - A simple, performant and scalable Jax LLM!
- GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
- FairScale - FairScale is a PyTorch extension library for high performance and large scale training.
- Alpa - Alpa is a system for training and serving large-scale neural networks.
- DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
- maxtext - A simple, performant and scalable Jax LLM!
-
Tools for deploying LLM
- SkyPilot - Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
- vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
- Text Generation Inference - A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.
- wechat-chatgpt - Use ChatGPT On Wechat via wechaty
- Agenta - Easily build, version, evaluate and deploy your LLM-powered apps.
- Haystack - an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
- Sidekick - Data integration platform for LLMs.
- Embedchain - Framework to create ChatGPT like bots over your dataset.
- FastChat - A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
-
Courses about LLM
- Stanford
- Princeton
- OpenBMB
- Stanford
- Stanford
- Stanford Webinar
- Aston Zhang - 7UM8iUTj3qKqdhbQULP5I&index=29)
- 陳縕儂
- 李沐
- 李沐 - 7UM8iUTj3qKqdhbQULP5I&index=18)
- 李沐
- DeepLearning.AI
-
Datasets of Multimodal Instruction Tuning
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering - zhang.github.io/PMC-VQA/) | Large-scale medical visual question-answering dataset |
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT#video-instruction-dataset-open_file_folder) | 100K high-quality video instruction dataset |
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med#llava-med-dataset) | A large-scale, broad-coverage biomedical instruction-following dataset |
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning - scale, broad-coverage multimodal instruction tuning dataset |
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning - | The first multimodal instruction tuning benchmark dataset |
-
Other useful resources
- OpenAGI - When LLM Meets Domain Experts.
- HuggingGPT - Solving AI Tasks with ChatGPT and its Friends in HuggingFace.
- EasyEdit - An easy-to-use framework to edit large language models.
- chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.
- Open-evals - A framework extend openai's [Evals](https://github.com/openai/evals) for different language model.
- Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
- Major LLMs + Data Availability
- 500+ Best AI Tools
- Mistral - Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases including code and 8k sequence length. Apache 2.0 licence.
- Emergent Mind - The latest AI news, curated & explained by GPT-4.
- Mixtral 8x7B - a high-quality sparse mixture of experts model (SMoE) with open weights.
- AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
- chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
-
Datasets of Multimodal Chain-of-Thought
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering - download-the-dataset) | Large-scale multi-choice dataset, featuring multimodal science questions and diverse domains |
- Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction - time dataset that can be used to evaluate VideoCOT |
-
RLHFdataset
-
Efficiency
- HH-RLHF
- PromptSource
- Stable Alignment - Alignment Learning in Social Games
- Stanford Human Preferences Dataset(SHP)
- Structured Knowledge Grounding(SKG) Resources Collections
- rlhf-reward-datasets
- webgpt_comparisons
- summarize_from_feedback
- Structured Knowledge Grounding(SKG) Resources Collections
- The Flan Collection
- Dahoas/synthetic-instruct-gptj-pairwise
- LIMA
- Planning In Natural Language Improves LLM Search For Code Generation
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
- Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
- On Designing Effective RL Reward at Training Time for LLM Reasoning
- Generative Verifiers: Reward Modeling as Next-Token Prediction
- Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
- PROCESSBENCH: Identifying Process Errors in Mathematical Reasoning
- AFlow: Automating Agentic Workflow Generation
- Interpretable Contrastive Monte Carlo Tree Search Reasoning
- Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
- Mixture-of-Agents Enhances Large Language Model Capabilities
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
- Advancing LLM Reasoning Generalists with Preference Trees
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
- AlphaMath Almost Zero: Process Supervision Without Process
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
- Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
- Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
- When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
- Do Large Language Models Latently Perform Multi-Hop Reasoning?
- Chain-of-Thought Reasoning Without Prompting
- Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
- Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
- ReFT: Reasoning with Reinforced Fine-Tuning
- VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
- Stream of Search (SoS): Learning to Search in Language
- GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI
- Evaluating LLMs at Detecting Errors in LLM Responses
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
- Not All LLM Reasoners Are Created Equal
- LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
- Thinking LLMs: General Instruction Following with Thought Generation
- Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning Through Trap Problems
- V-STaR: Training Verifiers for Self-Taught Reasoners
- CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks
- RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
- Structured Knowledge Grounding(SKG) Resources Collections
- Structured Knowledge Grounding(SKG) Resources Collections
- Structured Knowledge Grounding(SKG) Resources Collections
- Structured Knowledge Grounding(SKG) Resources Collections
- Structured Knowledge Grounding(SKG) Resources Collections
-
2021
-
2017
- Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
- [Paper - SA.html)]
- [Paper - Understanding-Gender-Bias-in-Neural-Relation-Extraction)]
- [Paper - ml-bias-analysis)]
- [Paper
- [Paper
- [Paper
- [Paper - salt/implicit-hate)]
- [Paper - Generated-Hate-Speech-Dataset)]
- [Paper - data)]
- [Paper - zhou/CDial-Bias)]
- [Paper - PM)]
- [Paper - rottger/hatecheck-data)]
- [Paper
- [Paper - mll/crows-pairs)] [[Source](https://huggingface.co/datasets/crows_pairs)]
- [Paper
- [Paper - science/bold)] [[Source](https://huggingface.co/datasets/AlexaAI/bold)]
- [Paper
- [Paper
- [Paper
- [Paper - mll/bbq)]
- [Paper
- [Paper - LIT/MultilingualBias)]
- [Paper
- [Paper
- [Paper - biases)]
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - lab/M3KE)]
- [Paper - NLP-SG/M3Exam)]
- [Paper - Eval)]
- [Paper - bench)]
- [GitHub
- - orange)
- [Paper - sys/FastChat/tree/main/fastchat/llm_judge)]
- [Source - yellow)
- [Source - compass/opencompass)]
- [Source - yellow)
- [Paper
- [Paper - Fathom/GPT-Fathom)]
- [Paper
- [Paper
- [Paper - li/CMMLU)]
- [Paper
- [Paper - lab/InstructEvalImpact)] [[GitHub](https://github.com/declare-lab/instruct-eval)]
- [Paper - Lab/CLEVA)]
-
2023
- Training Chain-of-Thought via Latent-Variable Inference
- Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
- OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
- Reasoning with Language Model is Planning with World Model
- Certified reasoning with language models
- Large Language Models Cannot Self-Correct Reasoning Yet
- Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
-
2022
-
-
Multimodal In-Context Learning
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
- Star - and-Play Compositional Reasoning with Large Language Models**](https://arxiv.org/pdf/2304.09842.pdf) <br> | arXiv | 2023-04-19 | [Github](https://github.com/lupantech/chameleon-llm) | [Demo](https://chameleon-llm.github.io/) |
-
Multimodal Chain-of-Thought
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction** - 05-23 | - | - |
- Star - 02-24 | [Github](https://github.com/yu-rp/VisualPerceptionToken) | - |
- Star - Paper-Conference.pdf) <br> | NeurIPS | 2022-09-20 | [Github](https://github.com/lupantech/ScienceQA) | - |
-
Foundation Models
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
- **GPT-4 Technical Report** - 03-15 | - | - |
- Star - 02-27 | [Github](https://github.com/microsoft/unilm) | - |
-
Others
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- Star - 11-24 | [Github](https://github.com/jonathan-roberts1/charting-new-territories) | - |
- Star - 05-29 | [Github](https://github.com/yuhangzang/ContextDET) | [Demo](https://huggingface.co/spaces/yuhangzang/ContextDet-Demo) |
- Star - 05-26 | [Github](https://github.com/kohjingyu/gill) | - |
-
LLM Leaderboards
-
Datasets of In-Context Learning
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
-
Multimodal Instruction Tuning
- **M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning** - 06-07 | - | - |
- **MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning** - 12-21 | - | - |
- Star - Owl: Modularization Empowers Large Language Models with Multimodality**](https://arxiv.org/pdf/2304.14178.pdf) <br> | arXiv | 2023-04-27 | [Github](https://github.com/X-PLUG/mPLUG-Owl) | [Demo](https://huggingface.co/spaces/MAGAer13/mPLUG-Owl) |
-
Practical Guides for Prompting (Helpful)
-
Pretraining data
-
Star History
-
Efficiency
- ![Star History Chart - history.com/#Atomic-man007/Awesome_Multimodel_LLM)
-
-
High-quality generation
-
Deep understanding
-
Raising the length limit of Transformers
-
Memory retrieval
-
Compressing memories with vectors or data structures
Programming Languages
Categories
RLHFdataset
127
Practical Guide for NLP Tasks
116
Datasets of Pre-Training for Alignment
95
Open Source LLM
55
Milestone Papers
52
LLM Leaderboards
25
Tutorials about LLM
24
Datasets of Multimodal Instruction Tuning
21
Prompting libraries & tools
19
Other useful resources
13
Raising the length limit of Transformers
12
Courses about LLM
12
Pretraining data
11
LLM Training Frameworks
10
Tools for deploying LLM
9
Memory retrieval
7
Multimodal Chain-of-Thought
7
Trending LLM Projects
6
Others
5
Datasets of Multimodal Chain-of-Thought
4
Foundation Models
4
High-quality generation
3
Compressing memories with vectors or data structures
3
Deep understanding
3
Multimodal In-Context Learning
3
Multimodal Instruction Tuning
3
Datasets of In-Context Learning
2
Practical Guides for Prompting (Helpful)
2
Star History
1
Sub Categories
Keywords
llm
16
large-language-models
13
deep-learning
12
chatgpt
12
language-model
11
gpt
8
machine-learning
8
llama
8
transformers
7
pytorch
7
gpt-3
6
ai
6
prompt-engineering
6
nlp
5
openai
5
generative-ai
5
transformer
4
artificial-intelligence
4
chinese
4
inference
4
chatbot
3
natural-language-processing
3
python
3
gpu
3
ollama
2
prompt-tuning
2
tpu
2
llms
2
embeddings
2
instruction-tuning
2
huggingface
2
instruction-following
2
rlhf
2
rnn
2
rwkv
2
mistral
2
gpt-2
2
bloom
2
fine-tuning
2
framework
2
gpt4
2
foundation-models
2
lora
2
aigc
2
data-parallelism
2
stable-diffusion
2
model-parallelism
2
pipeline-parallelism
2
deepseek
2
alpaca
2