Awesome_Multimodel_LLM

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
https://github.com/Atomic-man007/Awesome_Multimodel_LLM

Last synced: 13 days ago
JSON representation

Prompting libraries & tools
- LlamaIndex
- YiVal - source GenAI-Ops tool for tuning and evaluating prompts, configurations, and model parameters using customizable datasets, evaluation methods, and improvement strategies.
- Semantic Kernel
- Prompttools - source Python tools for testing and evaluating models, vector DBs, and prompts.
- Promptify
- Weights & Biases
- OpenAI Evals - source library for evaluating task performance of language models and prompts.
- ModelFusion - A TypeScript library for building apps with LLMs and other ML models (speech-to-text, text-to-speech, image generation).
- Flappy - Ready LLM Agent SDK for Every Developer.
- FLAML (A Fast Library for Automated Machine Learning & Tuning)
- Guardrails.ai
- PromptPerfect
- Arthur Shield
- GPTRouter - GPTRouter is an open source LLM API Gateway that offers a universal API for 30+ LLMs, vision, and image models, with smart fallbacks based on uptime and latency, automatic retries, and streaming. Stay operational even when OpenAI is down
- Weights & Biases
- LangChain
- Scale Spellbook
- Outlines - specific language to simplify prompting and constrain generation.
Practical Guide for NLP Tasks
- Efficiency
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Blog Post
  - Article
  - Paper
  - Blog Post
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Article
  - Article
  - Paper
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
  - Article
- Generation tasks
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
- Knowledge-intensive tasks
  - Link
  - Paper
  - Paper
  - Paper
  - Paper
- Traditional NLU tasks
  - Paper
  - Paper
  - Paper
- Abilities with Scaling
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
- Specific tasks
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper
  - Paper - roberts1/GPT4GEO)
- Real-World ''Tasks''
  - Paper
Trending LLM Projects
- llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- promptbase - All things prompt engineering.
- Devika
- anything-llm - A private ChatGPT to chat with anything!
- phi-2 - a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
- ollama - Get up and running with Llama 2 and other large language models locally.
Tutorials about LLM
- DAIR.AI
- Keyvan Kambakhsh
- Andrej Karpathy
- Hyung Won Chung
- Jason Wei
- Susan Zhang
- Ameet Deshpande
- Yao Fu
- Hung-yi Lee
- Jay Mody
- ICML 2022
- NeurIPS 2022
- Andrej Karpathy - video-lecture)
- Philipp Schmid
- HuggingFace
- HuggingFace
- 张俊林
- 大师兄
- HeptaAI
- Stephen Wolfram
- Jingfeng Yang
- Hung-yi Lee
- Stephen Wolfram
- Hung-yi Lee
- Susan Zhang
Milestone Papers
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Attention Is All You Need
- Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Language Models are Unsupervised Multitask Learners
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Language models are few-shot learners
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Evaluating Large Language Models Trained on Code
- On the Opportunities and Risks of Foundation Models
- Finetuned Language Models are Zero-Shot Learners
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- LaMDA: Language Models for Dialog Applications
- Solving Quantitative Reasoning Problems with Language Models
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- Training language models to follow instructions with human feedback
- PaLM: Scaling Language Modeling with Pathways
- OPT: Open Pre-trained Transformer Language Models
- Emergent Abilities of Large Language Models
- Language Models are General-Purpose Interfaces
- Improving alignment of dialogue agents via targeted human judgements
- Scaling Instruction-Finetuned Language Models
- GLM-130B: An Open Bilingual Pre-trained Model
- Holistic Evaluation of Language Models
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- LLaMA: Open and Efficient Foundation Language Models
- Language Is Not All You Need: Aligning Perception with Language Models
- GPT-4 Technical Report
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
- PaLM 2 Technical Report
- RWKV: Reinventing RNNs for the Transformer Era
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- On the Opportunities and Risks of Foundation Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- OPT: Open Pre-trained Transformer Language Models
- Holistic Evaluation of Language Models
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Evaluating Large Language Models Trained on Code
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Language Models are General-Purpose Interfaces
- GLM-130B: An Open Bilingual Pre-trained Model
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Scaling Instruction-Finetuned Language Models
- Attention Is All You Need
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- LaMDA: Language Models for Dialog Applications
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- Training language models to follow instructions with human feedback
- PaLM: Scaling Language Modeling with Pathways
- Improving alignment of dialogue agents via targeted human judgements
Datasets of Pre-Training for Alignment
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Microsoft COCO: Common Objects in Context - Text |
- Im2Text: Describing Images Using 1 Million Captioned Photographs - Text |
- Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- MSR-VTT: A Large Video Description Dataset for Bridging Video and Language - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- Microsoft COCO: Common Objects in Context - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
Open Source LLM
- Flan-Alpaca - Instruction Tuning from Humans and Machines.
- Baize - Baize is an open-source chat model trained with [LoRA](https://github.com/microsoft/LoRA). It uses 100k dialogs generated by letting ChatGPT chat with itself.
- Cabrita - A portuguese finetuned instruction LLaMA.
- Llama-X - Open Academic Research on Improving LLaMA to SOTA LLM.
- Chinese-Vicuna - A Chinese Instruction-following LLaMA-based Model.
- GPTQ-for-LLaMA - 4 bits quantization of [LLaMA](https://arxiv.org/abs/2302.13971) using [GPTQ](https://arxiv.org/abs/2210.17323).
- GPT4All - Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa.
- BELLE - Be Everyone's Large Language model Engine
- Phoenix
- WizardLM|WizardCoder - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder.
- CaMA - a Chinese-English Bilingual LLaMA Model.
- BayLing - an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.
- UltraLM - Large-scale, Informative, and Diverse Multi-round Chat Models.
- Guanaco - QLoRA tuned LLaMA
- GLM - GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.
- ChatGLM2-6B - An Open Bilingual Chat LLM | 开源双语对话语言模型
- RWKV - Parallelizable RNN with Transformer-level LLM Performance.
- ChatRWKV - ChatRWKV is like ChatGPT but powered by my RWKV (100% RNN) language model.
- GPT-Neo - An implementation of model & data parallel [GPT3](https://arxiv.org/abs/2005.14165)-like models using the [mesh-tensorflow](https://github.com/tensorflow/mesh) library.
- Pythia - Interpreting Autoregressive Transformers Across Time and Scale
- OpenFlamingo - an open-source reproduction of DeepMind's Flamingo model.
- h2oGPT
- Open-Assistant - a project meant to give everyone access to a great chat based large language model.
- XGen - Salesforce open-source LLMs with 8k sequence length.
- LLaMA2 - A revolutionary version of llama , 70 - 13 - 7 -billion-parameter large language model. [LLaMA2](https://github.com/facebookresearch/llama) [HF - TheBloke/Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ)
- Alpaca - A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. [Alpaca.cpp](https://github.com/antimatter15/alpaca.cpp) [Alpaca-LoRA](https://github.com/tloen/alpaca-lora)
- Vicuna - An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
- Koala - A Dialogue Model for Academic Research
- StackLLaMA - A hands-on guide to train LLaMA with RLHF.
- Orca - Microsoft's finetuned LLaMA model that reportedly matches GPT3.5, finetuned against 5M of data, ChatGPT, and GPT4
- BLOOM - BigScience Large Open-science Open-access Multilingual Language Model [BLOOM-LoRA](https://github.com/linhduongtuan/BLOOM-LORA)
- BLOOMZ&mT0 - a family of models capable of following human instructions in dozens of languages zero-shot.
- T5 - Text-to-Text Transfer Transformer
- OPT - Open Pre-trained Transformer Language Models.
- YaLM - a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
- Dolly - a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT.
- Dolly 2.0 - the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
- Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
- GALACTICA - The GALACTICA models are trained on a large-scale scientific corpus.
- GALPACA - GALACTICA 30B fine-tuned on the Alpaca dataset.
- Palmyra - Palmyra Base was primarily pre-trained with English text.
- Camel - a state-of-the-art instruction-following large language model designed to deliver exceptional performance and versatility.
- PanGu-α - PanGu-α is a 200B parameter autoregressive pretrained Chinese language model develped by Huawei Noah's Ark Lab, MindSpore Team and Peng Cheng Laboratory.
- StarCoder - Hugging Face LLM for Code
- MPT-7B - Open LLM for commercial use by MosaicML
- Aquila - 悟道·天鹰语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
- T0 - Multitask Prompted Training Enables Zero-Shot Task Generalization
- Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
- MOSS - MOSS是一个支持中英双语和多种插件的开源对话语言模型.
- LLaMA2 - A revolutionary version of llama , 70 - 13 - 7 -billion-parameter large language model. [LLaMA2](https://github.com/facebookresearch/llama) [HF - TheBloke/Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ)
- Falcon - Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
- HuggingChat - Powered by Open Assistant's latest model – the best open source chat model right now and @huggingface Inference API.
- baichuan-7B - baichuan-7B 是由百川智能开发的一个开源可商用的大规模预训练语言模型.
- ChatGLM-6B - ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型，基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构，具有 62 亿参数.
- LLaMA - A foundational, 65-billion-parameter large language model. [LLaMA.cpp](https://github.com/ggerganov/llama.cpp) [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)
- CaMA - a Chinese-English Bilingual LLaMA Model.
- GPT-J - A 6 billion parameter, autoregressive text generation model trained on [The Pile](https://pile.eleuther.ai/).
- UL2 - a unified framework for pretraining models that are universally effective across datasets and setups.
LLM Training Frameworks
- Megatron-LM - Ongoing research training transformer models at scale.
- Colossal-AI - Making large AI models cheaper, faster, and more accessible.
- BMTrain - Efficient Training for Big Models.
- Mesh Tensorflow - Mesh TensorFlow: Model Parallelism Made Easier.
- maxtext - A simple, performant and scalable Jax LLM!
- GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
- FairScale - FairScale is a PyTorch extension library for high performance and large scale training.
- Alpa - Alpa is a system for training and serving large-scale neural networks.
- maxtext - A simple, performant and scalable Jax LLM!
- DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Tools for deploying LLM
- SkyPilot - Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
- vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
- Text Generation Inference - A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.
- wechat-chatgpt - Use ChatGPT On Wechat via wechaty
- Agenta - Easily build, version, evaluate and deploy your LLM-powered apps.
- Haystack - an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
- Sidekick - Data integration platform for LLMs.
- FastChat - A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
- Sidekick - Data integration platform for LLMs.
- promptfoo - Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
Courses about LLM
- Stanford
- Princeton
- OpenBMB
- Stanford
- Stanford
- Stanford Webinar
- 李沐
- 陳縕儂
- 李沐
- 李沐 - 7UM8iUTj3qKqdhbQULP5I&index=18)
- Aston Zhang - 7UM8iUTj3qKqdhbQULP5I&index=29)
- DeepLearning.AI
Datasets of Multimodal Instruction Tuning
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT#video-instruction-dataset-open_file_folder) | 100K high-quality video instruction dataset |
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med#llava-med-dataset) | A large-scale, broad-coverage biomedical instruction-following dataset |
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering - zhang.github.io/PMC-VQA/) | Large-scale medical visual question-answering dataset |
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning - | The first multimodal instruction tuning benchmark dataset |
- M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning - scale, broad-coverage multimodal instruction tuning dataset |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med#llava-med-dataset) | A large-scale, broad-coverage biomedical instruction-following dataset |
Other useful resources
- OpenAGI - When LLM Meets Domain Experts.
- HuggingGPT - Solving AI Tasks with ChatGPT and its Friends in HuggingFace.
- EasyEdit - An easy-to-use framework to edit large language models.
- chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.
- Open-evals - A framework extend openai's [Evals](https://github.com/openai/evals) for different language model.
- Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
- Major LLMs + Data Availability
- 500+ Best AI Tools
- Mixtral 8x7B - a high-quality sparse mixture of experts model (SMoE) with open weights.
- AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
- chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
- Mistral - Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases including code and 8k sequence length. Apache 2.0 licence.
- Emergent Mind - The latest AI news, curated & explained by GPT-4.
Datasets of Multimodal Chain-of-Thought
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering - download-the-dataset) | Large-scale multi-choice dataset, featuring multimodal science questions and diverse domains |
- Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction - time dataset that can be used to evaluate VideoCOT |
RLHFdataset
- Efficiency
- 2023
- 2022
Multimodal In-Context Learning
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
Multimodal Chain-of-Thought
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction** - 05-23 | - | - |
- Star - 02-24 | [Github](https://github.com/yu-rp/VisualPerceptionToken) | - |
Foundation Models
- **GPT-4 Technical Report** - 03-15 | - | - |
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
Others
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- Star - 11-24 | [Github](https://github.com/jonathan-roberts1/charting-new-territories) | - |
Datasets of In-Context Learning
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
Multimodal Instruction Tuning
- **M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning** - 06-07 | - | - |
- **MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning** - 12-21 | - | - |
Practical Guides for Prompting (Helpful)
- Blog
- Blog
Pretraining data
- - Paper
  - Paper
  - Paper
  - Blog
  - Repo
- Finetuning data
  - Paper
  - Paper
  - Paper
- Test data/user data
  - Paper
  - Paper
  - Paper
Star History
- Efficiency
  - ![Star History Chart - history.com/#Atomic-man007/Awesome_Multimodel_LLM)
High-quality generation
- 2023/10 - icler/PCA-EVAL)]
- 2023/06
- 2023/04
- 2023/06
Deep understanding
- 2022/08
- 2021/10
Raising the length limit of Transformers
- 2023/05 - deepmind/randomized_positional_encodings)]
- 2023-03
- 2023/10
- 2022/03 - science/efficient-longdoc-classification)]
- 2021/12 - research/longt5)]
- 2019/10
- 2023/10
- 2023/09
- 2023/08
- 2023/08
- 2023/04
- 2023/03
Memory retrieval
- 2023/06
- 2023/08
- 2023/08
- 2023/05 - cn/RecurrentGPT)]
- 2023/05 - siliconfriend)]
- 2023/04 - research/generative_agents)]
- 2023/05 - cn/RecurrentGPT)]
Compressing memories with vectors or data structures
- 2023/07
- 2023/05
- 2023/05 - lora)]

Programming Languages

Python 49 Jupyter Notebook 7 TypeScript 3 Rust 2 JavaScript 2 C# 1 HTML 1 C++ 1 C 1 MDX 1

Categories

Datasets of Pre-Training for Alignment 90 Practical Guide for NLP Tasks 88 RLHFdataset 68 Milestone Papers 66 Open Source LLM 58 Tutorials about LLM 25 Datasets of Multimodal Instruction Tuning 22 Prompting libraries & tools 18 Other useful resources 13 Raising the length limit of Transformers 12 Courses about LLM 12 Pretraining data 11 Tools for deploying LLM 10 LLM Training Frameworks 10 Memory retrieval 7 Trending LLM Projects 6 Multimodal Chain-of-Thought 6 High-quality generation 4 Datasets of Multimodal Chain-of-Thought 4 Compressing memories with vectors or data structures 3 Others 3 Foundation Models 3 Datasets of In-Context Learning 2 Deep understanding 2 Multimodal In-Context Learning 2 Practical Guides for Prompting (Helpful) 2 Multimodal Instruction Tuning 2 Star History 1

Sub Categories

Efficiency 114 Specific tasks 13 2023 7 Abilities with Scaling 6 Knowledge-intensive tasks 5 Generation tasks 5 2022 3 Test data/user data 3 Traditional NLU tasks 3 Finetuning data 3 Real-World ''Tasks'' 1

Keywords

llm 16 chatgpt 13 large-language-models 13 deep-learning 12 language-model 11 llama 9 gpt 8 machine-learning 8 gpt-3 8 pytorch 7 transformers 7 ai 6 openai 6 prompt-engineering 6 chinese 5 generative-ai 4 artificial-intelligence 4 chatbot 4 inference 4 python 4 nlp 4 lora 3 natural-language-processing 3 instruction-tuning 3 knowlm 3 gpu 3 gpt4 3 transformer 3 pre-trained-model 2 llama2 2 deepseek 2 pre-trained-language-models 2 models 2 embeddings 2 fine-tuning 2 javascript 2 bloom 2 vector-database 2 instructions 2 llm-serving 2 instruction-following 2 framework 2 rwkv 2 rnn 2 tpu 2 bilingual 2 mistral 2 deepspeed 2 english 2 data-parallelism 2