Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
https://github.com/Atomic-man007/Awesome_Multimodel_LLM
Last synced: 2 days ago
JSON representation
-
Trending LLM Projects
- llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- promptbase - All things prompt engineering.
- Devika
- anything-llm - A private ChatGPT to chat with anything!
- phi-2 - a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
- ollama - Get up and running with Llama 2 and other large language models locally.
-
Tutorials about LLM
-
Milestone Papers
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Attention Is All You Need
- Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Language Models are Unsupervised Multitask Learners
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Language models are few-shot learners
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Evaluating Large Language Models Trained on Code
- On the Opportunities and Risks of Foundation Models
- Finetuned Language Models are Zero-Shot Learners
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- LaMDA: Language Models for Dialog Applications
- Solving Quantitative Reasoning Problems with Language Models
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- Training language models to follow instructions with human feedback
- PaLM: Scaling Language Modeling with Pathways
- OPT: Open Pre-trained Transformer Language Models
- Emergent Abilities of Large Language Models
- Language Models are General-Purpose Interfaces
- Improving alignment of dialogue agents via targeted human judgements
- Scaling Instruction-Finetuned Language Models
- GLM-130B: An Open Bilingual Pre-trained Model
- Holistic Evaluation of Language Models
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- LLaMA: Open and Efficient Foundation Language Models
- Language Is Not All You Need: Aligning Perception with Language Models
- GPT-4 Technical Report
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
- PaLM 2 Technical Report
- RWKV: Reinventing RNNs for the Transformer Era
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- On the Opportunities and Risks of Foundation Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- OPT: Open Pre-trained Transformer Language Models
- Holistic Evaluation of Language Models
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Evaluating Large Language Models Trained on Code
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Language Models are General-Purpose Interfaces
- GLM-130B: An Open Bilingual Pre-trained Model
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Scaling Instruction-Finetuned Language Models
- Attention Is All You Need
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- LaMDA: Language Models for Dialog Applications
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- Training language models to follow instructions with human feedback
- PaLM: Scaling Language Modeling with Pathways
- Improving alignment of dialogue agents via targeted human judgements
-
Datasets of Pre-Training for Alignment
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Microsoft COCO: Common Objects in Context - Text |
- Im2Text: Describing Images Using 1 Million Captioned Photographs - Text |
- Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- MSR-VTT: A Large Video Description Dataset for Bridging Video and Language - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- Microsoft COCO: Common Objects in Context - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
-
Open Source LLM
- Flan-Alpaca - Instruction Tuning from Humans and Machines.
- Baize - Baize is an open-source chat model trained with [LoRA](https://github.com/microsoft/LoRA). It uses 100k dialogs generated by letting ChatGPT chat with itself.
- Cabrita - A portuguese finetuned instruction LLaMA.
- Llama-X - Open Academic Research on Improving LLaMA to SOTA LLM.
- Chinese-Vicuna - A Chinese Instruction-following LLaMA-based Model.
- GPTQ-for-LLaMA - 4 bits quantization of [LLaMA](https://arxiv.org/abs/2302.13971) using [GPTQ](https://arxiv.org/abs/2210.17323).
- GPT4All - Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa.
- BELLE - Be Everyone's Large Language model Engine
- Phoenix
- WizardLM|WizardCoder - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder.
- CaMA - a Chinese-English Bilingual LLaMA Model.
- BayLing - an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.
- UltraLM - Large-scale, Informative, and Diverse Multi-round Chat Models.
- Guanaco - QLoRA tuned LLaMA
- GLM - GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.
- ChatGLM2-6B - An Open Bilingual Chat LLM | 开源双语对话语言模型
- RWKV - Parallelizable RNN with Transformer-level LLM Performance.
- ChatRWKV - ChatRWKV is like ChatGPT but powered by my RWKV (100% RNN) language model.
- GPT-Neo - An implementation of model & data parallel [GPT3](https://arxiv.org/abs/2005.14165)-like models using the [mesh-tensorflow](https://github.com/tensorflow/mesh) library.
- Pythia - Interpreting Autoregressive Transformers Across Time and Scale
- OpenFlamingo - an open-source reproduction of DeepMind's Flamingo model.
- h2oGPT
- Open-Assistant - a project meant to give everyone access to a great chat based large language model.
- XGen - Salesforce open-source LLMs with 8k sequence length.
- LLaMA2 - A revolutionary version of llama , 70 - 13 - 7 -billion-parameter large language model. [LLaMA2](https://github.com/facebookresearch/llama) [HF - TheBloke/Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ)
- Alpaca - A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. [Alpaca.cpp](https://github.com/antimatter15/alpaca.cpp) [Alpaca-LoRA](https://github.com/tloen/alpaca-lora)
- Vicuna - An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
- Koala - A Dialogue Model for Academic Research
- StackLLaMA - A hands-on guide to train LLaMA with RLHF.
- Orca - Microsoft's finetuned LLaMA model that reportedly matches GPT3.5, finetuned against 5M of data, ChatGPT, and GPT4
- BLOOM - BigScience Large Open-science Open-access Multilingual Language Model [BLOOM-LoRA](https://github.com/linhduongtuan/BLOOM-LORA)
- BLOOMZ&mT0 - a family of models capable of following human instructions in dozens of languages zero-shot.
- T5 - Text-to-Text Transfer Transformer
- OPT - Open Pre-trained Transformer Language Models.
- YaLM - a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
- Dolly - a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT.
- Dolly 2.0 - the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
- Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
- GALACTICA - The GALACTICA models are trained on a large-scale scientific corpus.
- GALPACA - GALACTICA 30B fine-tuned on the Alpaca dataset.
- Palmyra - Palmyra Base was primarily pre-trained with English text.
- Camel - a state-of-the-art instruction-following large language model designed to deliver exceptional performance and versatility.
- PanGu-α - PanGu-α is a 200B parameter autoregressive pretrained Chinese language model develped by Huawei Noah's Ark Lab, MindSpore Team and Peng Cheng Laboratory.
- StarCoder - Hugging Face LLM for Code
- MPT-7B - Open LLM for commercial use by MosaicML
- Aquila - 悟道·天鹰语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
- T0 - Multitask Prompted Training Enables Zero-Shot Task Generalization
- Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
- MOSS - MOSS是一个支持中英双语和多种插件的开源对话语言模型.
- LLaMA2 - A revolutionary version of llama , 70 - 13 - 7 -billion-parameter large language model. [LLaMA2](https://github.com/facebookresearch/llama) [HF - TheBloke/Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ)
- Falcon - Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
- LLaMA - A foundational, 65-billion-parameter large language model. [LLaMA.cpp](https://github.com/ggerganov/llama.cpp) [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)
- HuggingChat - Powered by Open Assistant's latest model – the best open source chat model right now and @huggingface Inference API.
- baichuan-7B - baichuan-7B 是由百川智能开发的一个开源可商用的大规模预训练语言模型.
- StableLM - Stability AI Language Models.
- ChatGLM-6B - ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型,基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构,具有 62 亿参数.
- CaMA - a Chinese-English Bilingual LLaMA Model.
- GPT-J - A 6 billion parameter, autoregressive text generation model trained on [The Pile](https://pile.eleuther.ai/).
- UL2 - a unified framework for pretraining models that are universally effective across datasets and setups.
-
LLM Training Frameworks
- DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
- Megatron-DeepSpeed - DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
- Megatron-LM - Ongoing research training transformer models at scale.
- Colossal-AI - Making large AI models cheaper, faster, and more accessible.
- BMTrain - Efficient Training for Big Models.
- Mesh Tensorflow - Mesh TensorFlow: Model Parallelism Made Easier.
- maxtext - A simple, performant and scalable Jax LLM!
- GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
- FairScale - FairScale is a PyTorch extension library for high performance and large scale training.
- Alpa - Alpa is a system for training and serving large-scale neural networks.
- maxtext - A simple, performant and scalable Jax LLM!
-
Tools for deploying LLM
- SkyPilot - Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
- vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
- Text Generation Inference - A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.
- wechat-chatgpt - Use ChatGPT On Wechat via wechaty
- Agenta - Easily build, version, evaluate and deploy your LLM-powered apps.
- Haystack - an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
- Sidekick - Data integration platform for LLMs.
- FastChat - A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
- Embedchain - Framework to create ChatGPT like bots over your dataset.
- Sidekick - Data integration platform for LLMs.
- promptfoo - Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
-
Courses about LLM
- Stanford
- Princeton
- OpenBMB
- Stanford
- Stanford
- Stanford Webinar
- 李沐
- 陳縕儂
- 李沐
- 李沐 - 7UM8iUTj3qKqdhbQULP5I&index=18)
- Aston Zhang - 7UM8iUTj3qKqdhbQULP5I&index=29)
- DeepLearning.AI
-
Datasets of Multimodal Instruction Tuning
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT#video-instruction-dataset-open_file_folder) | 100K high-quality video instruction dataset |
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med#llava-med-dataset) | A large-scale, broad-coverage biomedical instruction-following dataset |
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering - zhang.github.io/PMC-VQA/) | Large-scale medical visual question-answering dataset |
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning - | The first multimodal instruction tuning benchmark dataset |
- M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning - scale, broad-coverage multimodal instruction tuning dataset |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT#video-instruction-dataset-open_file_folder) | 100K high-quality video instruction dataset |
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med#llava-med-dataset) | A large-scale, broad-coverage biomedical instruction-following dataset |
-
Other useful resources
- OpenAGI - When LLM Meets Domain Experts.
- HuggingGPT - Solving AI Tasks with ChatGPT and its Friends in HuggingFace.
- EasyEdit - An easy-to-use framework to edit large language models.
- chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.
- Open-evals - A framework extend openai's [Evals](https://github.com/openai/evals) for different language model.
- Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
- Major LLMs + Data Availability
- 500+ Best AI Tools
- Mixtral 8x7B - a high-quality sparse mixture of experts model (SMoE) with open weights.
- AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
- chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
- Mistral - Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases including code and 8k sequence length. Apache 2.0 licence.
-
Prompting libraries & tools
- YiVal - source GenAI-Ops tool for tuning and evaluating prompts, configurations, and model parameters using customizable datasets, evaluation methods, and improvement strategies.
- Semantic Kernel
- Prompttools - source Python tools for testing and evaluating models, vector DBs, and prompts.
- Promptify
- Weights & Biases
- OpenAI Evals - source library for evaluating task performance of language models and prompts.
- ModelFusion - A TypeScript library for building apps with LLMs and other ML models (speech-to-text, text-to-speech, image generation).
- Flappy - Ready LLM Agent SDK for Every Developer.
- FLAML (A Fast Library for Automated Machine Learning & Tuning)
- Guardrails.ai
- PromptPerfect
- Arthur Shield
- GPTRouter - GPTRouter is an open source LLM API Gateway that offers a universal API for 30+ LLMs, vision, and image models, with smart fallbacks based on uptime and latency, automatic retries, and streaming. Stay operational even when OpenAI is down
- Scale Spellbook
- LangChain
- LlamaIndex
- Guidance
-
Datasets of Multimodal Chain-of-Thought
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering - download-the-dataset) | Large-scale multi-choice dataset, featuring multimodal science questions and diverse domains |
- Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction - time dataset that can be used to evaluate VideoCOT |
-
Practical Guide for NLP Tasks
-
Generation tasks
-
Knowledge-intensive tasks
-
Efficiency
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Blog Post
- Article
- Paper
- Blog Post
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Article
- Article
- Paper
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
-
Traditional NLU tasks
-
Abilities with Scaling
-
Specific tasks
-
Real-World ''Tasks''
-
-
RLHFdataset
-
Efficiency
- HH-RLHF
- PromptSource
- Stable Alignment - Alignment Learning in Social Games
- Stanford Human Preferences Dataset(SHP)
- Structured Knowledge Grounding(SKG) Resources Collections
- rlhf-reward-datasets
- webgpt_comparisons
- summarize_from_feedback
- The Flan Collection
- Dahoas/synthetic-instruct-gptj-pairwise
- LIMA
- PROCESSBENCH: Identifying Process Errors in Mathematical Reasoning
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
- Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
- On Designing Effective RL Reward at Training Time for LLM Reasoning
- Generative Verifiers: Reward Modeling as Next-Token Prediction
- Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
- Planning In Natural Language Improves LLM Search For Code Generation
- AFlow: Automating Agentic Workflow Generation
- Interpretable Contrastive Monte Carlo Tree Search Reasoning
- Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
- Mixture-of-Agents Enhances Large Language Model Capabilities
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
- Advancing LLM Reasoning Generalists with Preference Trees
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
- AlphaMath Almost Zero: Process Supervision Without Process
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
- Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
- Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
- When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
- Do Large Language Models Latently Perform Multi-Hop Reasoning?
- Chain-of-Thought Reasoning Without Prompting
- Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
- Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
- ReFT: Reasoning with Reinforced Fine-Tuning
- VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
- Stream of Search (SoS): Learning to Search in Language
- GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI
- Evaluating LLMs at Detecting Errors in LLM Responses
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
- Not All LLM Reasoners Are Created Equal
- LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
- Thinking LLMs: General Instruction Following with Thought Generation
- Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning Through Trap Problems
- V-STaR: Training Verifiers for Self-Taught Reasoners
- CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks
- RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
-
2023
- Training Chain-of-Thought via Latent-Variable Inference
- Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
- OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
- Reasoning with Language Model is Planning with World Model
- Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
- Certified reasoning with language models
- Large Language Models Cannot Self-Correct Reasoning Yet
-
2022
-
-
Multimodal In-Context Learning
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
-
Multimodal Chain-of-Thought
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction** - 05-23 | - | - |
-
Foundation Models
- **GPT-4 Technical Report** - 03-15 | - | - |
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
- **GPT-4 Technical Report** - 03-15 | - | - |
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
-
Others
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- Star - 11-24 | [Github](https://github.com/jonathan-roberts1/charting-new-territories) | - |
-
Datasets of In-Context Learning
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
-
Memory retrieval
-
Multimodal Instruction Tuning
-
Practical Guides for Prompting (Helpful)
-
Pretraining data
-
Star History
-
Efficiency
- ![Star History Chart - history.com/#Atomic-man007/Awesome_Multimodel_LLM)
- ![Star History Chart - history.com/#Atomic-man007/Awesome_Multimodel_LLM)
-
-
High-quality generation
-
Deep understanding
-
Raising the length limit of Transformers
-
Compressing memories with vectors or data structures
Programming Languages
Categories
Datasets of Pre-Training for Alignment
90
Practical Guide for NLP Tasks
78
RLHFdataset
68
Milestone Papers
66
Open Source LLM
59
Tutorials about LLM
23
Datasets of Multimodal Instruction Tuning
23
Prompting libraries & tools
17
Other useful resources
12
Raising the length limit of Transformers
12
Courses about LLM
12
Tools for deploying LLM
11
LLM Training Frameworks
11
Pretraining data
11
Memory retrieval
7
Trending LLM Projects
6
Multimodal Chain-of-Thought
5
High-quality generation
4
Datasets of Multimodal Chain-of-Thought
4
Foundation Models
4
Compressing memories with vectors or data structures
3
Others
3
Datasets of In-Context Learning
2
Multimodal In-Context Learning
2
Deep understanding
2
Star History
2
Multimodal Instruction Tuning
2
Practical Guides for Prompting (Helpful)
2
Sub Categories
Keywords
llm
17
large-language-models
14
chatgpt
14
deep-learning
12
language-model
11
llama
9
gpt-3
8
machine-learning
8
gpt
8
pytorch
7
ai
7
transformers
7
openai
6
prompt-engineering
5
python
5
chinese
5
artificial-intelligence
4
chatbot
4
inference
4
nlp
4
lora
3
natural-language-processing
3
instruction-tuning
3
generative-ai
3
knowlm
3
gpu
3
embeddings
3
transformer
3
rag
3
gpt4
3
llmops
2
bilingual
2
deepspeed
2
english
2
instructie
2
gpt-2
2
instruction-following
2
rnn
2
mistral
2
instructions
2
tpu
2
model-parallelism
2
models
2
pipeline-parallelism
2
pre-trained-language-models
2
data-parallelism
2
rwkv
2
pre-trained-model
2
pre-training
2
reasoning
2