Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
https://github.com/Hannibal046/Awesome-LLM
Last synced: 5 days ago
JSON representation
-
Milestone Papers
- Attention Is All You Need
- Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Language Models are Unsupervised Multitask Learners
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Language models are few-shot learners
- Evaluating Large Language Models Trained on Code
- On the Opportunities and Risks of Foundation Models
- Finetuned Language Models are Zero-Shot Learners
- WebGPT: Browser-assisted question-answering with human feedback
- Improving language models by retrieving from trillions of tokens
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Solving Quantitative Reasoning Problems with Language Models
- Training language models to follow instructions with human feedback
- An empirical analysis of compute-optimal large language model training
- OPT: Open Pre-trained Transformer Language Models
- Emergent Abilities of Large Language Models
- Language Models are General-Purpose Interfaces
- GLM-130B: An Open Bilingual Pre-trained Model
- Holistic Evaluation of Language Models
- Galactica: A Large Language Model for Science
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- LLaMA: Open and Efficient Foundation Language Models
- Language Is Not All You Need: Aligning Perception with Language Models
- PaLM-E: An Embodied Multimodal Language Model
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
- PaLM 2 Technical Report
- RWKV: Reinventing RNNs for the Transformer Era
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Scaling Instruction-Finetuned Language Models
- Improving alignment of dialogue agents via targeted human judgements
- Unifying Language Learning Paradigms
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- LaMDA: Language Models for Dialog Applications
- Jamba: A Hybrid Transformer-Mamba Language Model
- Attention Is All You Need
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Evaluating Large Language Models Trained on Code
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- LaMDA: Language Models for Dialog Applications
- Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- PaLM: Scaling Language Modeling with Pathways
- Language Models are General-Purpose Interfaces
- Improving alignment of dialogue agents via targeted human judgements
- Scaling Instruction-Finetuned Language Models
- GLM-130B: An Open Bilingual Pre-trained Model
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Mistral 7B
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- GPT-4 Technical Report
- Mistral 7B
- An empirical analysis of compute-optimal large language model training
- Resurrecting Recurrent Neural Networks for Long Sequences
- Visual Instruction Tuning
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- The Llama 3 Herd of Models
- Multitask Prompted Training Enables Zero-Shot Task Generalization
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- PaLM: Scaling Language Modeling with Pathways
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Training language models to follow instructions with human feedback
- OPT: Open Pre-trained Transformer Language Models
- Holistic Evaluation of Language Models
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- On the Opportunities and Risks of Foundation Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Mistral 7B
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
-
Open LLM
-
- ckpt - 05 | [Homepage](https://falconllm.tii.ae) | [Apache 2.0](https://huggingface.co/tiiuae) |
- ckpt - 05 | [Paper](https://arxiv.org/pdf/2205.05131v1.pdf) | [Apache 2.0](https://huggingface.co/google/ul2) |
- ckpt - 04 | [Paper](https://arxiv.org/pdf/2104.12369.pdf) | [Apache 2.0](https://github.com/huawei-noah/Pretrained-Language-Model/blob/4624dbadfe00e871789b509fe10232c77086d1de/PanGu-%CE%B1/LICENSE) |
- ckpt - 10 | [Paper](https://jmlr.org/papers/v21/20-074.html) | [Apache 2.0](https://huggingface.co/t5-11b) |
- api - 10 | [Paper](https://arxiv.org/pdf/2012.00413.pdf) | - |
- ckpt - 09 | [Github](https://github.com/BlinkDL/RWKV-LM) | [Apache 2.0](https://huggingface.co/BlinkDL/rwkv-4-pile-7b) |
- ckpt - 09 | [Github](https://github.com/kingoflolz/mesh-transformer-jax) | [Apache 2.0](https://huggingface.co/EleutherAI/gpt-j-6b) |
- OPT-1.3|6.7|13|30|66B
- T5
- Open Ko-LLM Leaderboard - The Open Ko-LLM Leaderboard objectively evaluates the performance of Korean Large Language Model (LLM).
- api - 08 | [Paper](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf) | - |
- api - 05 | [Paper](https://arxiv.org/pdf/2205.01068.pdf) | [OPT-175B License Agreement](https://github.com/facebookresearch/metaseq/blob/edefd4a00c24197486a3989abe28ca4eb3881e59/projects/OPT/MODEL_LICENSE.md) |
- api - 05 | [Paper](https://arxiv.org/pdf/2005.14165.pdf) | - |
- Mixtral-8x7B
- Yet Another LLM Leaderboard - Leaderboard made with LLM AutoEval using Nous benchmark suite.
- RecurrentGemma-2B
- DeepSeek-Coder-v2-16|236B-MOE
- Pythia-1|1.4|2.8|6.9|12B
- Qwen2-0.5|1.5|7|57-MOE|72B
- Nemotron-4-340B
- Llama 3-8|70B
- api - 05 | [Paper](https://arxiv.org/pdf/2005.14165.pdf) | - |
- BLOOMZ&mT0
- Gemma-2|7B
- Llama 3.1-8|70|405B
- Codestral-7|22B
- Qwen1.5-0.5B|1.8B|4B|7B|14B|32B|72B|110B|MoE-A2.7B
- Qwen2.5-0.5B|1.5B|3B|7B|14B|32B|72B
- CodeQwen1.5-7B
- Qwen2.5-Coder-1.5B|7B|32B
- Qwen2-Math-1.5B|7B|72B
- Qwen2.5-Math-1.5B|7B|72B
- Qwen2-VL-2B|7B|72B
- Qwen2-Audio-7B
- Llama 2-7|13|70B
- Mistral-7B
- Mixtral-8x22B
- OpenELM-1.1|3B
- Phi1-1.3B
- Phi2-2.7B
- Phi3-3.8|7|14B
- OLMo-7B
- Grok-1-314B-MoE
- Command R-35B
- DeepSeek-Math-7B
- DeepSeek-Coder-1.3|6.7|7|33B
- DeepSeek-VL-1.3|7B
- DeepSeek-MoE-16B
- Qwen-1.8B|7B|14B|72B
- Qwen1.5-1.8|4|7|14|32|72|110B
- CodeQwen-7B
- Qwen-VL-7B
- Yi-34B
- Yi1.5-6|9|34B
- Yi-VL-6B|34B
- Baichuan2-7|13B
- GLM-2|6|10|13|70B
- CogVLM2-19B
- MiniCPM-2B
- OmniLLM-12B
- VisCPM-10B
- CPM-Bee-1|2|5|10B
- RWKV-v4|5|6
- StableCode-3B
- StarCoder-1|3|7B
- StarCoder2-3|7|15B
- MPT-7B
- InternLM2-1.8|7|20B
- InternLM-Math-7B|20B
- InternLM-XComposer2-1.8|7B
- InternVL-2|6|14|26
- StableLM-v2-1.6|12B
- Gemma2-9|27B
- DBRX-132B-MoE
-
Instruction finetuned LLM
- api - 03 | [Paper](https://arxiv.org/pdf/2203.02155.pdf) | - |
- ckpt - 11| [Paper](https://arxiv.org/pdf/2211.09085.pdf)| [CC-BY-NC-4.0](https://github.com/paperswithcode/galai/blob/3a724f562af1a0c8ff97a096c5fbebe579e2160f/LICENSE-MODEL.md) |
- ckpt - 03 | [Blog](https://www.yitay.net/blog/flan-ul2-20b)| [Apache 2.0](https://huggingface.co/google/flan-ul2) |
- ckpt - 10|[Paper](https://arxiv.org/pdf/2210.11416.pdf)| [Apache 2.0](https://github.com/google-research/t5x/blob/776279bdacd8c5a2d3e8ce0f2e7064bd98e98b47/LICENSE) |
- ckpt - 10|[Paper](https://arxiv.org/pdf/2110.08207.pdf)| [Apache 2.0](https://huggingface.co/bigscience/T0) |
- demo - 03|[Github](https://github.com/tatsu-lab/stanford_alpaca)| [CC BY NC 4.0](https://github.com/tatsu-lab/stanford_alpaca/blob/main/WEIGHT_DIFF_LICENSE) |
- ckpt - 06|[Paper](https://arxiv.org/pdf/2306.02707)|[Non-commercial bespoke license](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) |
- ckpt - 06|[Paper](https://arxiv.org/pdf/2306.02707)|[Non-commercial bespoke license](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) |
-
RLHF LLM
- demo - 03 | [Blog](https://www.anthropic.com/index/introducing-claude) |
- survey paper
- demo - x9IvKno0A4sk30) | 2022-11 | [Blog](https://openai.com/blog/chatgpt/) |
-
-
Open LLM -->
-
RLHF LLM
- Mistral - Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases including code and 8k sequence length. Apache 2.0 licence.
- LLaMA - 2](https://ai.meta.com/llama/) - A foundational large language model. [LLaMA.cpp](https://github.com/ggerganov/llama.cpp) [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)
- Alpaca - A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. [Alpaca.cpp](https://github.com/antimatter15/alpaca.cpp) [Alpaca-LoRA](https://github.com/tloen/alpaca-lora)
- Vicuna - An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
- YaLM - a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
- Koala - A Dialogue Model for Academic Research
- StackLLaMA - A hands-on guide to train LLaMA with RLHF.
- Orca - Microsoft's finetuned LLaMA model that reportedly matches GPT3.5, finetuned against 5M of data, ChatGPT, and GPT4
- StableLM - Stability AI Language Models.
- Dolly - a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT.
- Dolly 2.0 - the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
- Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
- GALACTICA - The GALACTICA models are trained on a large-scale scientific corpus.
- GALPACA - GALACTICA 30B fine-tuned on the Alpaca dataset.
- Palmyra - Palmyra Base was primarily pre-trained with English text.
- Camel - a state-of-the-art instruction-following large language model designed to deliver exceptional performance and versatility.
- PanGu-α - PanGu-α is a 200B parameter autoregressive pretrained Chinese language model develped by Huawei Noah's Ark Lab, MindSpore Team and Peng Cheng Laboratory.
- StarCoder - Hugging Face LLM for Code
- Baichuan - A series of large language models developed by Baichuan Intelligent Technology.
- phi-1 - a new large language model for code, with significantly smaller size than competing models.
- phi-1.5 - a 1.3 billion parameter model trained on a dataset of 30 billion tokens, which achieves common sense reasoning benchmark results comparable to models ten times its size that were trained on datasets more than ten times larger.
- HuggingChat - Powered by Open Assistant's latest model – the best open source chat model right now and @huggingface Inference API.
- Falcon - Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
- Aquila - 悟道·天鹰语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
- phi-2 - a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
- Jamba - A Hybrid Transformer-Mamba MoE model, with 52B params, first production grade mamba based LLM, 256K context support.
- 7B-base - base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-base/summary),
- Trending Demo - RWKV-5 trained on 100+ world languages (70% English, 15% multilang, 15% code).
- Qwen series - The large language model series proposed by Alibaba Cloud. | 阿里云研发的通义千问大模型系列. 包括 [7B](https://huggingface.co/Qwen/Qwen-7B), [72B](https://huggingface.co/Qwen/Qwen-72B), 及各种量化和Chat版本. [Chat Demo](https://huggingface.co/spaces/Qwen/Qwen-72B-Chat-Demo)
- XVERSE series - Multilingual large language model developed by XVERSE Technology Inc | 由深圳元象科技自主研发的支持多语言的大语言模型. 包括[7B](https://github.com/xverse-ai/XVERSE-7B), [13B](https://github.com/xverse-ai/XVERSE-13B), [65B](https://github.com/xverse-ai/XVERSE-65B)等.
- RAFT - RAFT: A new way to teach LLMs to be better at RAG ([paper](https://arxiv.org/abs/2403.10131)).
- BLOOM - BigScience Large Open-science Open-access Multilingual Language Model [BLOOM-LoRA](https://github.com/linhduongtuan/BLOOM-LORA)
- Command-R series - Two multilingual large language models intended for retrieval augmented generation (RAG) and conversational use, at [35](https://huggingface.co/CohereForAI/c4ai-command-r-v01) and [104](https://huggingface.co/CohereForAI/c4ai-command-r-plus) billion parameters. 128k context support.
-
-
LLM Leaderboard
- Chatbot Arena Leaderboard - a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
- AlpacaEval - An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
- OpenCompass 2.0 LLM Leaderboard - OpenCompass is an LLM evaluation platform, supporting a wide range of models (InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
- Open LLM Leaderboard - aims to track, rank, and evaluate LLMs and chatbots as they are released.
- ACLUE - an evaluation benchmark focused on ancient Chinese language comprehension.
- BeHonest - A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.
- Chinese Large Model Leaderboard - an expert-driven benchmark for Chineses LLMs.
- CompassRank - CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.
- CompMix - a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes).
- DreamBench++ - a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.
- FELM - a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).
- InfiBench - a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.
- MMedBench - a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.
- LawBench - a benchmark designed to evaluate large language models in the legal domain.
- LLMEval - focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.
- M3CoT - a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural and social sciences, physical and social commonsense, temporal reasoning, algebra, and geometry.
- MathEval - a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.
- MMToM-QA - a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.
- VisualWebArena - a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
- We-Math - a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
- WHOOPS! - a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.
- OlympicArena - a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.
- SuperLim - a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such as argumentation analysis, semantic similarity, and textual entailment.
- TAT-DQA - a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly in financial reports.
- TAT-QA - a large-scale question-answering benchmark focused on real-world financial data, integrating both tabular and textual information.
- SciBench - benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from domains like chemistry, physics, and mathematics.
- SuperBench - a benchmark platform designed for evaluating large language models (LLMs) on a range of tasks, particularly focusing on their performance in different aspects such as natural language understanding, reasoning, and generalization.
- Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released.
- Berkeley Function-Calling Leaderboard - evaluates LLM's ability to call external functions/tools.
- MixEval - a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU).
-
LLM Training Frameworks
-
RLHF LLM
- FairScale - FairScale is a PyTorch extension library for high performance and large scale training.
-
- DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
- Megatron-DeepSpeed - DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
- torchtune - A Native-PyTorch Library for LLM Fine-tuning.
- torchtitan - A native PyTorch Library for large model training.
- Megatron-LM - Ongoing research training transformer models at scale.
- Colossal-AI - Making large AI models cheaper, faster, and more accessible.
- BMTrain - Efficient Training for Big Models.
- Mesh Tensorflow - Mesh TensorFlow: Model Parallelism Made Easier.
- maxtext - A simple, performant and scalable Jax LLM!
- GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
- Alpa - Alpa is a system for training and serving large-scale neural networks.
-
-
LLM Deployment
- Haystack - an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
- llm-inference-solutions
- vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs.
- exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
- llama.cpp - LLM inference in C/C++.
- ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
- Langfuse - Open Source LLM Engineering Platform 🪢 Tracing, Evaluations, Prompt Management, Evaluations and Playground.
- FastChat - A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
- mistral.rs - Blazingly fast LLM inference.
- MindSQL - A python package for Txt-to-SQL with self hosting functionalities and RESTful APIs compatible with proprietary as well as open source LLM.
- SkyPilot - Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
- QA-Pilot - An interactive chat project that leverages Ollama/OpenAI/MistralAI LLMs for rapid understanding and navigation of GitHub code repository or compressed file resources.
- Shell-Pilot - Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies.
- Floom
- Swiss Army Llama - Comprehensive set of tools for working with local LLMs for various tasks.
- magentic - Seamlessly integrate LLMs as Python functions
- wechat-chatgpt - Use ChatGPT On Wechat via wechaty
- Agenta - Easily build, version, evaluate and deploy your LLM-powered apps.
- Serge - a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!
- Langroid - Harness LLMs with Multi-Agent Programming
- IntelliServer - simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.
- OpenLLM - Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at [BentoML](https://bentoml.com/) for LLMs-based applications.
- DeepSpeed-Mii - MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.
- Text-Embeddings-Inference - Inference for text-embeddings in Rust, HFOIL Licence.
- Infinity - Inference for text-embeddings in Python
- TensorRT-LLM - Nvidia Framework for LLM Inference
- FasterTransformer - NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
- Flash-Attention - A method designed to enhance the efficiency of Transformer models
- Langchain-Chatchat - Formerly langchain-ChatGLM, local knowledge based LLM (like ChatGLM) QA app with langchain.
- Search with Lepton - Build your own conversational search engine using less than 500 lines of code by [LeptonAI](https://github.com/leptonai).
- Robocorp - Create, deploy and operate Actions using Python anywhere to enhance your AI agents and assistants. Batteries included with an extensive set of libraries, helpers and logging.
- LMDeploy - A high-throughput and low-latency inference and serving framework for LLMs and VLs
- LLocalSearch - Locally running websearch using LLM chains
- AI Gateway - ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency.
- talkd.ai dialog - Simple API for deploying any RAG or LLM that you want adding plugins.
- Wllama - WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
- Embedchain - Framework to create ChatGPT like bots over your dataset.
- Sidekick - Data integration platform for LLMs.
- promptfoo - Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
- GPUStack - An open-source GPU cluster manager for running LLMs
- SGLang - SGLang is a fast serving framework for large language models and vision language models.
- Sidekick - Data integration platform for LLMs.
- LiteChain - Lightweight alternative to LangChain for composing LLMs
- Opik - Confidently evaluate, test, and ship LLM applications with a suite of observability tools to calibrate language model outputs across your dev and production lifecycle.
- TGI - a toolkit for deploying and serving Large Language Models (LLMs).
-
Deploying Tools
-
RLHF LLM
- Sidekick - Data integration platform for LLMs.
- LiteChain - Lightweight alternative to LangChain for composing LLMs
- promptfoo - Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
- Tune Studio - Playground for devs to finetune & deploy LLMs
-
-
Prompting libraries & tools
-
RLHF LLM
- LangChain
- Guidance
- Chainlit
- Guardrails.ai
- Outlines - specific language to simplify prompting and constrain generation.
- Scale Spellbook
- Weights & Biases
- LlamaIndex
- LMQL
-
-
LLM Applications
- FLAML (A Fast Library for Automated Machine Learning & Tuning)
- PromptPerfect
- Arthur Shield
- GPTRouter - GPTRouter is an open source LLM API Gateway that offers a universal API for 30+ LLMs, vision, and image models, with smart fallbacks based on uptime and latency, automatic retries, and streaming. Stay operational even when OpenAI is down
- AdalFlow - AdalFlow: The PyTorch library for LLM applications.
- dspy - DSPy: The framework for programming—not prompting—foundation models.
- YiVal - source GenAI-Ops tool for tuning and evaluating prompts, configurations, and model parameters using customizable datasets, evaluation methods, and improvement strategies.
- Evidently - source framework to evaluate, test and monitor ML and LLM-powered systems.
- Semantic Kernel
- Prompttools - source Python tools for testing and evaluating models, vector DBs, and prompts.
- Promptify
- OpenAI Evals - source library for evaluating task performance of language models and prompts.
- Flappy - Ready LLM Agent SDK for Every Developer.
- QAnything - A local knowledge base question-answering system designed to support a wide range of file formats and databases.
- llm-ui - A React library for building LLM UIs.
- Dify - An open-source LLM app development platform with an intuitive interface that streamlines AI workflows, model management, and production deployment.
- LazyLLM - An open-source LLM app for building multi-agent LLMs applications in an easy and lazy way, supports model deployment and fine-tuning.
- MemFree - Open Source Hybrid AI Search Engine, Instantly Get Accurate Answers from the Internet, Bookmarks, Notes, and Docs. Support One-Click Deployment
- OneKE - English knowledge extraction model with knowledge graphs and natural language processing technologies.
- unslothai - A framework that specializes in efficient fine-tuning. On its GitHub page, you can find ready-to-use fine-tuning templates for various LLMs, allowing you to easily train your own data for free on the Google Colab cloud.
- Outlines - specific language to simplify prompting and constrain generation.
- Weights & Biases
- Wallaroo.AI - Deploy, manage, optimize any model at scale across any environment from cloud to edge. Let's you go from python notebook to inferencing in minutes.
-
Tutorials
-
Great thoughts about LLM
- Why did all of the public reproduction of GPT-3 fail?
- A Stage Review of Instruction Tuning
- LLM Powered Autonomous Agents
- Why you should work on AI AGENTS!
- Google "We Have No Moat, And Neither Does OpenAI"
- AI competition statement
- Prompt Engineering
- Large Language Model Training in 2023
- The Next Generation Of Large Language Models
- Noam Chomsky: The False Promise of ChatGPT
- Is ChatGPT 175 Billion Parameters? Technical Analysis
- How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources
- Scaling, emergence, and reasoning in large language models
- Open Pretrained Transformers
-
LLM Tutorials and Courses
- ChatGPT Prompt Engineering
- Princeton: Understanding Large Language Models
- CS25-Transformers United
- llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- CS324 - Large Language Models
- femtoGPT - Pure Rust implementation of a minimal Generative Pretrained Transformer.
- UWaterloo CS 886 - Recent Advances on Foundation Models.
- State of GPT
- Let's build GPT: from scratch, in code, spelled out.
- Neurips2022-Foundational Robustness of Foundation Models
- ICML2022-Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models
- minbpe - Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
-
Courses
-
RLHF LLM
- OpenBMB
- Stanford
- Stanford Webinar
- Aston Zhang - 7UM8iUTj3qKqdhbQULP5I&index=29)
- MIT
- 李沐
- 陳縕儂
- 李沐
- 李沐 - 7UM8iUTj3qKqdhbQULP5I&index=18)
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- DeepLearning.AI
- Arize
- Arize
-
-
Books
-
RLHF LLM
- Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs - it comes with a [GitHub repository](https://github.com/benman1/generative_ai_with_langchain) that showcases a lot of the functionality
-
-
Opinions
-
RLHF LLM
- 我的大模型世界观 - 04-23] [陆奇]
- Towards ChatGPT and Beyond - 02-20][知乎][欧泽彬]
- 追赶ChatGPT的难点与平替 - 02-19][李rumor]
- ChatGPT发展历程、原理、技术架构详解和产业未来 - 02-15][知乎][陈巍谈芯]
- 对ChatGPT的二十点看法 - 02-13]\[知乎][熊德意]
- ChatGPT-所见、所闻、所感 - 02-11]\[知乎][刘聪NLP]
- 对话旷视研究院张祥雨|ChatGPT的科研价值可能更大 - 02-16][知乎][旷视科技]
- 关于ChatGPT八个技术问题的猜想 - 02-15][知乎][张家俊]
- Large Language Models: A New Moore's Law - 10-26\]\[Huggingface\]
-
-
Miscellaneous
- Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
- Major LLMs + Data Availability
- 500+ Best AI Tools
- Open-evals - A framework extend openai's [Evals](https://github.com/openai/evals) for different language model.
- OpenAGI - When LLM Meets Domain Experts.
- EasyEdit - An easy-to-use framework to edit large language models.
- chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.
- AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
- chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
-
Other Useful Resources
-
RLHF LLM
- Emergent Mind - The latest AI news, curated & explained by GPT-4.
- ShareGPT - Share your wildest ChatGPT conversations with one click.
- chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
- AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
- MTEB - Massive Text Embedding Benchmark Leaderboard
- Cohere Summarize Beta - Introducing Cohere Summarize Beta: A New Endpoint for Text Summarization
-
-
Trending LLM Projects
- LibreChat - All-In-One AI Conversations with LibreChat.
- Open-Sora - Democratizing Efficient Video Production for All.
- LLM101n - Let's build a Storyteller.
- miqu-1-70b - A leaked 70B model from Mistral AI.
- Sora - Sora is an AI model that can create realistic and imaginative scenes from text instructions.
- Deep-Live-Cam - real time face swap and one-click video deepfake with only a single image (uncensored).
- MiniCPM-V 2.6 - A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
- GPT-SoVITS - 1 min voice data can also be used to train a good TTS model! (few shot voice cloning).
- GPT-4o - OpenAI's new flagship model that can reason across audio, vision, and text in real time.
-
Other Papers
- Instruction-Tuning-Papers - A trend starts from `Natrural-Instruction` (ACL 2022), `FLAN` (ICLR 2022) and `T0` (ICLR 2022).
- Chain-of-Thought Hub - Measuring LLMs' Reasoning Performance
- Awesome-LLM-Inference - A curated list of Awesome LLM Inference Paper with codes.
- awesome-chatgpt-prompts-zh - A Chinese collection of prompt examples to be used with the ChatGPT model.
- Awesome ChatGPT - Curated list of resources for ChatGPT and GPT-3 from OpenAI.
- Chain-of-Thoughts Papers - A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models.
- Awesome Deliberative Prompting - How to ask LLMs to produce reliable reasoning and make reason-responsive decisions.
- Awesome-LLM-hallucination - LLM hallucination paper list.
- awesome-hallucination-detection - List of papers on hallucination detection in LLMs.
- LLMsPracticalGuide - A curated list of practical guide resources of LLMs
- Awesome ChatGPT Prompts - A collection of prompt examples to be used with the ChatGPT model.
- Awesome GPT - A curated list of awesome projects and resources related to GPT, ChatGPT, OpenAI, LLM, and more.
- Awesome GPT-3 - a collection of demos and articles about the [OpenAI GPT-3 API](https://openai.com/blog/openai-api/).
- RWKV-howto - possibly useful materials and tutorial for learning RWKV.
- Awesome LLM Security - A curation of awesome tools, documents and projects about LLM Security.
- Awesome-Align-LLM-Human - A collection of papers and resources about aligning large language models (LLMs) with human.
- Awesome-Code-LLM - An awesome and curated list of best code-LLM for research.
- Awesome-LLM-Compression - Awesome LLM compression research papers and tools.
- Awesome-LLM-Systems - Awesome LLM systems research papers.
- awesome-llm-webapps - A collection of open source, actively maintained web apps for LLM applications.
- awesome-japanese-llm - 日本語LLMまとめ - Overview of Japanese LLMs.
- Awesome-LLM-Healthcare - The paper list of the review on LLMs in medicine.
- Awesome-LLM-3D - A curated list of Multi-modal Large Language Model in 3D world, including 3D understanding, reasoning, generation, and embodied agents.
- LLM4Opt - Applying Large language models (LLMs) for diverse optimization tasks (Opt) is an emerging research area. This is a collection of references and papers of LLM4Opt.
- Awesome-Chinese-LLM - 整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
- awesome-language-model-analysis - This paper list focuses on the theoretical or empirical analysis of language models, e.g., the learning dynamics, expressive capacity, interpretability, generalization, and other interesting topics.
- LLMDatahub - a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset
- Awesome LLM Human Preference Datasets - a collection of human preference datasets for LLM instruction tuning, RLHF and evaluation.
- ModelEditingPapers - A paper & resource list on model editing for large language models.
-
LLM Evaluation:
- lm-evaluation-harness - A framework for few-shot evaluation of language models.
- MixEval - A reliable click-and-go evaluation suite compatible with both open-source and proprietary models, supporting MixEval and other benchmarks.
- lighteval - a lightweight LLM evaluation suite that Hugging Face has been using internally.
- OLMO-eval - a repository for evaluating open language models.
- instruct-eval - This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
- simple-evals - Eval tools by OpenAI.
- Giskard - Testing & evaluation library for LLM applications, in particular RAGs
- Ragas - a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.
- LangSmith - a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.
-
LLM Books
- Build a Large Language Model (From Scratch) - A guide to building your own working LLM.
- BUILD GPT: HOW AI WORKS - explains how to code a Generative Pre-trained Transformer, or GPT, from scratch.
- Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs - it comes with a [GitHub repository](https://github.com/benman1/generative_ai_with_langchain) that showcases a lot of the functionality
- Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs - it comes with a [GitHub repository](https://github.com/benman1/generative_ai_with_langchain) that showcases a lot of the functionality
-
LLM Data
- IBM data-prep-kit - Open-Source Toolkit for Efficient Unstructured Data Processing with Pre-built Modules and Local to Cluster Scalability.
Programming Languages
Categories
Milestone Papers
87
Open LLM
85
LLM Deployment
45
Open LLM -->
33
LLM Leaderboard
30
Other Papers
29
LLM Applications
23
Courses
23
Tutorials
22
Great thoughts about LLM
14
LLM Training Frameworks
12
LLM Tutorials and Courses
12
Trending LLM Projects
9
LLM Evaluation:
9
Miscellaneous
9
Opinions
9
Prompting libraries & tools
9
Other Useful Resources
6
LLM Books
4
Deploying Tools
4
Books
1
LLM Data
1
Sub Categories
Keywords
llm
44
large-language-models
19
chatgpt
19
ai
19
openai
15
llama
14
gpt
11
chatbot
10
llms
10
generative-ai
10
llmops
10
langchain
9
rag
9
prompt-engineering
9
nlp
9
machine-learning
8
python
8
deep-learning
8
evaluation
7
llama2
7
awesome
6
evaluation-framework
5
gpt-4
5
transformers
5
mistral
5
agent
5
gpt-3
5
embeddings
4
llama3
4
artificial-intelligence
4
inference
4
gpt4
4
foundation-models
4
llm-inference
4
llm-evaluation
4
pytorch
4
language-model
4
llm-serving
4
mlops
4
survey
3
awesome-list
3
framework
3
fine-tuning
3
transformer
3
llamacpp
3
llm-framework
3
gpu
3
ollama
3
natural-language-processing
3
llama-index
3