Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
https://github.com/Atomic-man007/Awesome_Multimodel_LLM
Last synced: 3 days ago
JSON representation
-
Trending LLM Projects
- llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- promptbase - All things prompt engineering.
- Devika
- anything-llm - A private ChatGPT to chat with anything!
- phi-2 - a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
- ollama - Get up and running with Llama 2 and other large language models locally.
-
Tutorials about LLM
-
Milestone Papers
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Attention Is All You Need
- Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Language Models are Unsupervised Multitask Learners
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Language models are few-shot learners
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Evaluating Large Language Models Trained on Code
- On the Opportunities and Risks of Foundation Models
- Finetuned Language Models are Zero-Shot Learners
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- LaMDA: Language Models for Dialog Applications
- Solving Quantitative Reasoning Problems with Language Models
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- Training language models to follow instructions with human feedback
- PaLM: Scaling Language Modeling with Pathways
- OPT: Open Pre-trained Transformer Language Models
- Emergent Abilities of Large Language Models
- Language Models are General-Purpose Interfaces
- Improving alignment of dialogue agents via targeted human judgements
- Scaling Instruction-Finetuned Language Models
- GLM-130B: An Open Bilingual Pre-trained Model
- Holistic Evaluation of Language Models
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- LLaMA: Open and Efficient Foundation Language Models
- Language Is Not All You Need: Aligning Perception with Language Models
- GPT-4 Technical Report
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
- PaLM 2 Technical Report
- RWKV: Reinventing RNNs for the Transformer Era
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
- The-Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- On the Opportunities and Risks of Foundation Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Training language models to follow instructions with human feedback
- OPT: Open Pre-trained Transformer Language Models
- Holistic Evaluation of Language Models
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Evaluating Large Language Models Trained on Code
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- LaMDA: Language Models for Dialog Applications
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- PaLM: Scaling Language Modeling with Pathways
- Attention Is All You Need
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Language Models are General-Purpose Interfaces
- Improving alignment of dialogue agents via targeted human judgements
- Scaling Instruction-Finetuned Language Models
- GLM-130B: An Open Bilingual Pre-trained Model
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
-
Datasets of Pre-Training for Alignment
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Microsoft COCO: Common Objects in Context - Text |
- Im2Text: Describing Images Using 1 Million Captioned Photographs - Text |
- Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- MSR-VTT: A Large Video Description Dataset for Bridging Video and Language - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks - Text |
- Microsoft COCO: Common Objects in Context - Text |
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval - Text |
- WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research - Text |
- AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline - Text |
- AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations - Text |
-
Open Source LLM
- Flan-Alpaca - Instruction Tuning from Humans and Machines.
- Baize - Baize is an open-source chat model trained with [LoRA](https://github.com/microsoft/LoRA). It uses 100k dialogs generated by letting ChatGPT chat with itself.
- Cabrita - A portuguese finetuned instruction LLaMA.
- Llama-X - Open Academic Research on Improving LLaMA to SOTA LLM.
- Chinese-Vicuna - A Chinese Instruction-following LLaMA-based Model.
- GPTQ-for-LLaMA - 4 bits quantization of [LLaMA](https://arxiv.org/abs/2302.13971) using [GPTQ](https://arxiv.org/abs/2210.17323).
- GPT4All - Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa.
- BELLE - Be Everyone's Large Language model Engine
- Phoenix
- WizardLM|WizardCoder - Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder.
- CaMA - a Chinese-English Bilingual LLaMA Model.
- BayLing - an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.
- UltraLM - Large-scale, Informative, and Diverse Multi-round Chat Models.
- Guanaco - QLoRA tuned LLaMA
- GLM - GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.
- ChatGLM2-6B - An Open Bilingual Chat LLM | 开源双语对话语言模型
- RWKV - Parallelizable RNN with Transformer-level LLM Performance.
- ChatRWKV - ChatRWKV is like ChatGPT but powered by my RWKV (100% RNN) language model.
- GPT-Neo - An implementation of model & data parallel [GPT3](https://arxiv.org/abs/2005.14165)-like models using the [mesh-tensorflow](https://github.com/tensorflow/mesh) library.
- Pythia - Interpreting Autoregressive Transformers Across Time and Scale
- OpenFlamingo - an open-source reproduction of DeepMind's Flamingo model.
- h2oGPT
- Open-Assistant - a project meant to give everyone access to a great chat based large language model.
- XGen - Salesforce open-source LLMs with 8k sequence length.
- LLaMA2 - A revolutionary version of llama , 70 - 13 - 7 -billion-parameter large language model. [LLaMA2](https://github.com/facebookresearch/llama) [HF - TheBloke/Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ)
- Alpaca - A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. [Alpaca.cpp](https://github.com/antimatter15/alpaca.cpp) [Alpaca-LoRA](https://github.com/tloen/alpaca-lora)
- Vicuna - An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
- Koala - A Dialogue Model for Academic Research
- StackLLaMA - A hands-on guide to train LLaMA with RLHF.
- Orca - Microsoft's finetuned LLaMA model that reportedly matches GPT3.5, finetuned against 5M of data, ChatGPT, and GPT4
- BLOOM - BigScience Large Open-science Open-access Multilingual Language Model [BLOOM-LoRA](https://github.com/linhduongtuan/BLOOM-LORA)
- BLOOMZ&mT0 - a family of models capable of following human instructions in dozens of languages zero-shot.
- T5 - Text-to-Text Transfer Transformer
- OPT - Open Pre-trained Transformer Language Models.
- YaLM - a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
- Dolly - a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT.
- Dolly 2.0 - the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
- Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
- GALACTICA - The GALACTICA models are trained on a large-scale scientific corpus.
- GALPACA - GALACTICA 30B fine-tuned on the Alpaca dataset.
- Palmyra - Palmyra Base was primarily pre-trained with English text.
- Camel - a state-of-the-art instruction-following large language model designed to deliver exceptional performance and versatility.
- PanGu-α - PanGu-α is a 200B parameter autoregressive pretrained Chinese language model develped by Huawei Noah's Ark Lab, MindSpore Team and Peng Cheng Laboratory.
- StarCoder - Hugging Face LLM for Code
- MPT-7B - Open LLM for commercial use by MosaicML
- Aquila - 悟道·天鹰语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
- MOSS - MOSS是一个支持中英双语和多种插件的开源对话语言模型.
- T0 - Multitask Prompted Training Enables Zero-Shot Task Generalization
- Falcon - Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
- LLaMA - A foundational, 65-billion-parameter large language model. [LLaMA.cpp](https://github.com/ggerganov/llama.cpp) [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)
- HuggingChat - Powered by Open Assistant's latest model – the best open source chat model right now and @huggingface Inference API.
- baichuan-7B - baichuan-7B 是由百川智能开发的一个开源可商用的大规模预训练语言模型.
- UL2 - a unified framework for pretraining models that are universally effective across datasets and setups.
- ChatGLM-6B - ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型,基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构,具有 62 亿参数.
-
LLM Training Frameworks
- DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
- Megatron-DeepSpeed - DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
- Megatron-LM - Ongoing research training transformer models at scale.
- Colossal-AI - Making large AI models cheaper, faster, and more accessible.
- BMTrain - Efficient Training for Big Models.
- Mesh Tensorflow - Mesh TensorFlow: Model Parallelism Made Easier.
- maxtext - A simple, performant and scalable Jax LLM!
- GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
- FairScale - FairScale is a PyTorch extension library for high performance and large scale training.
- Alpa - Alpa is a system for training and serving large-scale neural networks.
-
Tools for deploying LLM
- SkyPilot - Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
- vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
- Text Generation Inference - A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.
- wechat-chatgpt - Use ChatGPT On Wechat via wechaty
- Agenta - Easily build, version, evaluate and deploy your LLM-powered apps.
- Haystack - an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
- FastChat - A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
- Embedchain - Framework to create ChatGPT like bots over your dataset.
- Sidekick - Data integration platform for LLMs.
- promptfoo - Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
- Sidekick - Data integration platform for LLMs.
-
Courses about LLM
- Stanford
- Princeton
- OpenBMB
- Stanford
- Stanford
- Stanford Webinar
- 李沐
- 陳縕儂
- 李沐
- 李沐 - 7UM8iUTj3qKqdhbQULP5I&index=18)
- Aston Zhang - 7UM8iUTj3qKqdhbQULP5I&index=29)
- DeepLearning.AI
-
Datasets of Multimodal Instruction Tuning
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT#video-instruction-dataset-open_file_folder) | 100K high-quality video instruction dataset |
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models - oryx/Video-ChatGPT#video-instruction-dataset-open_file_folder) | 100K high-quality video instruction dataset |
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med#llava-med-dataset) | A large-scale, broad-coverage biomedical instruction-following dataset |
- GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction - related instruction datasets |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering - zhang.github.io/PMC-VQA/) | Large-scale medical visual question-answering dataset |
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- DetGPT: Detect What You Need via Reasoning - tuning dataset with 5000 images and around 30000 query-answer pairs|
- X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages - LLM) | Chinese multimodal instruction dataset |
- VideoChat: Chat-Centric Video Understanding - centric multimodal instruction dataset |
- LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day - Med#llava-med-dataset) | A large-scale, broad-coverage biomedical instruction-following dataset |
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning - | The first multimodal instruction tuning benchmark dataset |
- M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning - scale, broad-coverage multimodal instruction tuning dataset |
- ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst - chatbridge.github.io/) | Multimodal instruction tuning dataset covering 16 multimodal tasks |
- mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality - PLUG/mPLUG-Owl/tree/main/OwlEval) | Dataset for evaluation on multiple capabilities |
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models - CAIR/cc_sbu_align) | Multimodal aligned dataset for improving model's usability and generation's fluency |
- Visual Instruction Tuning - Instruct-150K) | Multimodal instruction-following data generated by GPT|
-
Other useful resources
- OpenAGI - When LLM Meets Domain Experts.
- HuggingGPT - Solving AI Tasks with ChatGPT and its Friends in HuggingFace.
- EasyEdit - An easy-to-use framework to edit large language models.
- chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.
- Open-evals - A framework extend openai's [Evals](https://github.com/openai/evals) for different language model.
- Mistral - Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases including code and 8k sequence length. Apache 2.0 licence.
- Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
- Major LLMs + Data Availability
- 500+ Best AI Tools
- AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
- Mixtral 8x7B - a high-quality sparse mixture of experts model (SMoE) with open weights.
- chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
-
Prompting libraries & tools
- YiVal - source GenAI-Ops tool for tuning and evaluating prompts, configurations, and model parameters using customizable datasets, evaluation methods, and improvement strategies.
- Semantic Kernel
- Prompttools - source Python tools for testing and evaluating models, vector DBs, and prompts.
- Promptify
- Weights & Biases
- OpenAI Evals - source library for evaluating task performance of language models and prompts.
- ModelFusion - A TypeScript library for building apps with LLMs and other ML models (speech-to-text, text-to-speech, image generation).
- Flappy - Ready LLM Agent SDK for Every Developer.
- FLAML (A Fast Library for Automated Machine Learning & Tuning)
- Guardrails.ai
- PromptPerfect
- Arthur Shield
- GPTRouter - GPTRouter is an open source LLM API Gateway that offers a universal API for 30+ LLMs, vision, and image models, with smart fallbacks based on uptime and latency, automatic retries, and streaming. Stay operational even when OpenAI is down
- Outlines - specific language to simplify prompting and constrain generation.
- LangChain
- LlamaIndex
-
Datasets of Multimodal Chain-of-Thought
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought - scale embodied planning dataset |
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering - download-the-dataset) | Large-scale multi-choice dataset, featuring multimodal science questions and diverse domains |
- Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction - time dataset that can be used to evaluate VideoCOT |
-
Practical Guide for NLP Tasks
-
Generation tasks
-
Knowledge-intensive tasks
-
Efficiency
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Blog Post
- Article
- Paper
- Blog Post
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Article
- Article
- Paper
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
- Article
-
Traditional NLU tasks
-
Abilities with Scaling
-
Specific tasks
-
Real-World ''Tasks''
-
-
RLHFdataset
-
Efficiency
- HH-RLHF
- PromptSource
- Stable Alignment - Alignment Learning in Social Games
- Stanford Human Preferences Dataset(SHP)
- Structured Knowledge Grounding(SKG) Resources Collections
- rlhf-reward-datasets
- webgpt_comparisons
- summarize_from_feedback
- Paper
- The Flan Collection
- Dahoas/synthetic-instruct-gptj-pairwise
- LIMA
- Paper
- Paper
-
-
Multimodal In-Context Learning
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
- **Multimodal Few-Shot Learning with Frozen Language Models** - 06-25 | - | - |
-
Multimodal Chain-of-Thought
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings** - 05-03 | [Coming soon](https://github.com/dannyrose30/VCOT) | - |
- **Chain of Thought Prompt Tuning in Vision Language Models** - 04-16 | [Coming soon]() | - |
- **Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction** - 05-23 | - | - |
-
Foundation Models
- **GPT-4 Technical Report** - 03-15 | - | - |
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
- **GPT-4 Technical Report** - 03-15 | - | - |
- **PaLM-E: An Embodied Multimodal Language Model** - 03-06 | - | [Demo](https://palm-e.github.io/#demo) |
-
Others
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- **Can Large Pre-trained Models Help Vision Models on Perception Tasks?** - 06-01 | [Coming soon]() | - |
- Star - 11-24 | [Github](https://github.com/jonathan-roberts1/charting-new-territories) | - |
-
Pretraining data
-
Datasets of In-Context Learning
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
- MIMIC-IT: Multi-Modal In-Context Instruction Tuning - context instruction dataset|
-
Memory retrieval
-
Multimodal Instruction Tuning
- **M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning** - 06-07 | - | - |
- Star - 05-25 | [Github](https://github.com/joez17/ChatBridge) | - |
- Star - Language Instruction Tuning for Large Language Models**](https://arxiv.org/pdf/2305.15023.pdf) <br> | arXiv | 2023-05-24 | [Github](https://github.com/luogen1996/LaVIN) | Local Demo |
- Star - 05-23 | [Github](https://github.com/OptimalScale/DetGPT) | [Demo](https://d3c431c0c77b1d9010.gradio.live/) |
- **MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning** - 12-21 | - | - |
-
Practical Guides for Prompting (Helpful)
-
Star History
-
Efficiency
- ![Star History Chart - history.com/#Atomic-man007/Awesome_Multimodel_LLM)
- ![Star History Chart - history.com/#Atomic-man007/Awesome_Multimodel_LLM)
-
-
High-quality generation
-
Deep understanding
-
Raising the length limit of Transformers
-
Compressing memories with vectors or data structures
-
LLM-Aided Visual Reasoning
- Star - 03-14 | [Github](https://github.com/cvlab-columbia/viper) | Local Demo |
- Star - REACT: Prompting ChatGPT for Multimodal Reasoning and Action**](https://arxiv.org/pdf/2303.11381.pdf) <br> | arXiv | 2023-03-20 | [Github](https://github.com/microsoft/MM-REACT) | [Demo](https://huggingface.co/spaces/microsoft-cognitive-service/mm-react) |
Programming Languages
Categories
Practical Guide for NLP Tasks
79
Datasets of Pre-Training for Alignment
72
Milestone Papers
66
Open Source LLM
54
Tutorials about LLM
24
Datasets of Multimodal Instruction Tuning
23
Prompting libraries & tools
16
RLHFdataset
14
Other useful resources
12
Raising the length limit of Transformers
12
Courses about LLM
12
Tools for deploying LLM
11
LLM Training Frameworks
10
Pretraining data
8
Memory retrieval
7
Trending LLM Projects
6
Multimodal Instruction Tuning
5
Multimodal Chain-of-Thought
5
Foundation Models
4
Datasets of Multimodal Chain-of-Thought
4
High-quality generation
3
Compressing memories with vectors or data structures
3
Others
3
Practical Guides for Prompting (Helpful)
2
Star History
2
Deep understanding
2
Multimodal In-Context Learning
2
LLM-Aided Visual Reasoning
2
Datasets of In-Context Learning
2
Sub Categories
Keywords
llm
18
chatgpt
15
large-language-models
13
deep-learning
11
language-model
10
llama
8
machine-learning
8
ai
8
gpt
8
openai
8
transformers
7
gpt-3
7
prompt-engineering
7
pytorch
7
python
6
artificial-intelligence
5
chatbot
5
nlp
4
rag
4
chinese
4
inference
4
llmops
3
generative-ai
3
llms
3
transformer
3
gpu
3
gpt4
3
natural-language-processing
3
gpt-4
3
embeddings
3
prompt-tuning
2
lora
2
stable-diffusion
2
prompt-toolkit
2
tpu
2
instruction-tuning
2
knowlm
2
mistral
2
javascript
2
gpt-2
2
llm-evaluation
2
rwkv
2
rnn
2
aigc
2
langchain
2
llama2
2
ollama
2
vector-database
2
bloom
2
prompts
2