Awesome-LLM

Awesome-LLM: a curated list of Large Language Model
https://github.com/Hannibal046/Awesome-LLM

Last synced: 3 days ago
JSON representation

Milestone Papers
- Attention Is All You Need
- Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Language Models are Unsupervised Multitask Learners
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Language models are few-shot learners
- Evaluating Large Language Models Trained on Code
- On the Opportunities and Risks of Foundation Models
- Finetuned Language Models are Zero-Shot Learners
- WebGPT: Browser-assisted question-answering with human feedback
- Improving language models by retrieving from trillions of tokens
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Solving Quantitative Reasoning Problems with Language Models
- Training language models to follow instructions with human feedback
- An empirical analysis of compute-optimal large language model training
- OPT: Open Pre-trained Transformer Language Models
- Emergent Abilities of Large Language Models
- Language Models are General-Purpose Interfaces
- GLM-130B: An Open Bilingual Pre-trained Model
- Holistic Evaluation of Language Models
- Galactica: A Large Language Model for Science
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- LLaMA: Open and Efficient Foundation Language Models
- Language Is Not All You Need: Aligning Perception with Language Models
- PaLM-E: An Embodied Multimodal Language Model
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
- PaLM 2 Technical Report
- RWKV: Reinventing RNNs for the Transformer Era
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- LaMDA: Language Models for Dialog Applications
- Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- Training language models to follow instructions with human feedback
- PaLM: Scaling Language Modeling with Pathways
- Improving alignment of dialogue agents via targeted human judgements
- Scaling Instruction-Finetuned Language Models
- Qwen2.5 Technical Report
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Scaling Instruction-Finetuned Language Models
- Improving alignment of dialogue agents via targeted human judgements
- Unifying Language Learning Paradigms
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
- LaMDA: Language Models for Dialog Applications
- Jamba: A Hybrid Transformer-Mamba Language Model
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Scaling Laws for Neural Language Models
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Evaluating Large Language Models Trained on Code
- Language Models are General-Purpose Interfaces
- GLM-130B: An Open Bilingual Pre-trained Model
- Galactica: A Large Language Model for Science
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- GPT-4 Technical Report
- Mistral 7B
- An empirical analysis of compute-optimal large language model training
- Resurrecting Recurrent Neural Networks for Long Sequences
- Visual Instruction Tuning
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- The Llama 3 Herd of Models
- Multitask Prompted Training Enables Zero-Shot Task Generalization
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- PaLM: Scaling Language Modeling with Pathways
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- OPT: Open Pre-trained Transformer Language Models
- Holistic Evaluation of Language Models
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- On the Opportunities and Risks of Foundation Models
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher
- Mistral 7B
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- OLMo: Accelerating the Science of Language Models
- The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
- OLMoE: Open Mixture-of-Experts Language Models
- DeepSeek-V3 Technical Report
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- Training Compute-Optimal Large Language Models
- Attention Is All You Need
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Mistral 7B
Open LLM
- - ckpt - 05 | [Homepage](https://falconllm.tii.ae) | [Apache 2.0](https://huggingface.co/tiiuae) |
  - ckpt - 05 | [Paper](https://arxiv.org/pdf/2205.05131v1.pdf) | [Apache 2.0](https://huggingface.co/google/ul2) |
  - ckpt - 04 | [Paper](https://arxiv.org/pdf/2104.12369.pdf) | [Apache 2.0](https://github.com/huawei-noah/Pretrained-Language-Model/blob/4624dbadfe00e871789b509fe10232c77086d1de/PanGu-%CE%B1/LICENSE) |
  - ckpt - 10 | [Paper](https://jmlr.org/papers/v21/20-074.html) | [Apache 2.0](https://huggingface.co/t5-11b) |
  - api - 10 | [Paper](https://arxiv.org/pdf/2012.00413.pdf) | - |
  - ckpt - 09 | [Github](https://github.com/BlinkDL/RWKV-LM) | [Apache 2.0](https://huggingface.co/BlinkDL/rwkv-4-pile-7b) |
  - ckpt - 09 | [Github](https://github.com/kingoflolz/mesh-transformer-jax) | [Apache 2.0](https://huggingface.co/EleutherAI/gpt-j-6b) |
  - OPT-1.3|6.7|13|30|66B
  - T5
  - Open Ko-LLM Leaderboard - The Open Ko-LLM Leaderboard objectively evaluates the performance of Korean Large Language Model (LLM).
  - api - 08 | [Paper](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf) | - |
  - api - 05 | [Paper](https://arxiv.org/pdf/2205.01068.pdf) | [OPT-175B License Agreement](https://github.com/facebookresearch/metaseq/blob/edefd4a00c24197486a3989abe28ca4eb3881e59/projects/OPT/MODEL_LICENSE.md) |
  - api - 05 | [Paper](https://arxiv.org/pdf/2005.14165.pdf) | - |
  - Mixtral-8x7B
  - Yet Another LLM Leaderboard - Leaderboard made with LLM AutoEval using Nous benchmark suite.
  - RecurrentGemma-2B
  - DeepSeek-Coder-v2-16|236B-MOE
  - Pythia-1|1.4|2.8|6.9|12B
  - Qwen2-0.5|1.5|7|57-MOE|72B
  - Nemotron-4-340B
  - Llama 3-8|70B
  - api - 05 | [Paper](https://arxiv.org/pdf/2005.14165.pdf) | - |
  - BLOOMZ&mT0
  - Gemma-2|7B
  - Codestral-7|22B
  - Qwen1.5-0.5B|1.8B|4B|7B|14B|32B|72B|110B|MoE-A2.7B
  - Qwen2.5-0.5B|1.5B|3B|7B|14B|32B|72B
  - CodeQwen1.5-7B
  - Qwen2.5-Coder-1.5B|7B|32B
  - Qwen2-Math-1.5B|7B|72B
  - Qwen2.5-Math-1.5B|7B|72B
  - Qwen2-VL-2B|7B|72B
  - Qwen2-Audio-7B
  - Mistral-7B
  - Mixtral-8x22B
  - OpenELM-1.1|3B
  - Phi1-1.3B
  - Phi2-2.7B
  - Phi3-3.8|7|14B
  - OLMo-7B
  - Grok-1-314B-MoE
  - Command R-35B
  - DeepSeek-Math-7B
  - DeepSeek-Coder-1.3|6.7|7|33B
  - DeepSeek-VL-1.3|7B
  - DeepSeek-MoE-16B
  - Qwen-1.8B|7B|14B|72B
  - Qwen1.5-1.8|4|7|14|32|72|110B
  - CodeQwen-7B
  - Qwen-VL-7B
  - Yi-34B
  - Yi1.5-6|9|34B
  - Yi-VL-6B|34B
  - Baichuan2-7|13B
  - GLM-2|6|10|13|70B
  - CogVLM2-19B
  - MiniCPM-2B
  - OmniLLM-12B
  - VisCPM-10B
  - CPM-Bee-1|2|5|10B
  - RWKV-v4|5|6
  - StableCode-3B
  - StarCoder-1|3|7B
  - StarCoder2-3|7|15B
  - MPT-7B
  - InternLM2-1.8|7|20B
  - InternLM-Math-7B|20B
  - InternLM-XComposer2-1.8|7B
  - StableLM-v2-1.6|12B
  - Gemma2-9|27B
  - DBRX-132B-MoE
- Instruction finetuned LLM
  - api - 03 | [Paper](https://arxiv.org/pdf/2203.02155.pdf) | - |
  - ckpt - 11| [Paper](https://arxiv.org/pdf/2211.09085.pdf)| [CC-BY-NC-4.0](https://github.com/paperswithcode/galai/blob/3a724f562af1a0c8ff97a096c5fbebe579e2160f/LICENSE-MODEL.md) |
  - ckpt - 03 | [Blog](https://www.yitay.net/blog/flan-ul2-20b)| [Apache 2.0](https://huggingface.co/google/flan-ul2) |
  - ckpt - 10|[Paper](https://arxiv.org/pdf/2210.11416.pdf)| [Apache 2.0](https://github.com/google-research/t5x/blob/776279bdacd8c5a2d3e8ce0f2e7064bd98e98b47/LICENSE) |
  - ckpt - 10|[Paper](https://arxiv.org/pdf/2110.08207.pdf)| [Apache 2.0](https://huggingface.co/bigscience/T0) |
  - demo - 03|[Github](https://github.com/tatsu-lab/stanford_alpaca)| [CC BY NC 4.0](https://github.com/tatsu-lab/stanford_alpaca/blob/main/WEIGHT_DIFF_LICENSE) |
  - ckpt - 06|[Paper](https://arxiv.org/pdf/2306.02707)|[Non-commercial bespoke license](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) |
- RLHF LLM
  - demo - 03 | [Blog](https://www.anthropic.com/index/introducing-claude) |
  - survey paper
  - demo - x9IvKno0A4sk30) | 2022-11 | [Blog](https://openai.com/blog/chatgpt/) |
Open LLM -->
- RLHF LLM
  - LLaMA - 2](https://ai.meta.com/llama/) - A foundational large language model. [LLaMA.cpp](https://github.com/ggerganov/llama.cpp) [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)
  - Alpaca - A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. [Alpaca.cpp](https://github.com/antimatter15/alpaca.cpp) [Alpaca-LoRA](https://github.com/tloen/alpaca-lora)
  - Vicuna - An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
  - YaLM - a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
  - Koala - A Dialogue Model for Academic Research
  - StackLLaMA - A hands-on guide to train LLaMA with RLHF.
  - Orca - Microsoft's finetuned LLaMA model that reportedly matches GPT3.5, finetuned against 5M of data, ChatGPT, and GPT4
  - StableLM - Stability AI Language Models.
  - Dolly - a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT.
  - Dolly 2.0 - the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
  - Cerebras-GPT - A Family of Open, Compute-efficient, Large Language Models.
  - GALACTICA - The GALACTICA models are trained on a large-scale scientific corpus.
  - GALPACA - GALACTICA 30B fine-tuned on the Alpaca dataset.
  - Palmyra - Palmyra Base was primarily pre-trained with English text.
  - Camel - a state-of-the-art instruction-following large language model designed to deliver exceptional performance and versatility.
  - PanGu-α - PanGu-α is a 200B parameter autoregressive pretrained Chinese language model develped by Huawei Noah's Ark Lab, MindSpore Team and Peng Cheng Laboratory.
  - StarCoder - Hugging Face LLM for Code
  - Baichuan - A series of large language models developed by Baichuan Intelligent Technology.
  - phi-1 - a new large language model for code, with significantly smaller size than competing models.
  - phi-1.5 - a 1.3 billion parameter model trained on a dataset of 30 billion tokens, which achieves common sense reasoning benchmark results comparable to models ten times its size that were trained on datasets more than ten times larger.
  - HuggingChat - Powered by Open Assistant's latest model – the best open source chat model right now and @huggingface Inference API.
  - Falcon - Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
  - Aquila - 悟道·天鹰语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
  - phi-2 - a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
  - Jamba - A Hybrid Transformer-Mamba MoE model, with 52B params, first production grade mamba based LLM, 256K context support.
  - 7B-base - base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-base/summary),
  - Trending Demo - RWKV-5 trained on 100+ world languages (70% English, 15% multilang, 15% code).
  - Qwen series - The large language model series proposed by Alibaba Cloud. ｜阿里云研发的通义千问大模型系列. 包括 [7B](https://huggingface.co/Qwen/Qwen-7B), [72B](https://huggingface.co/Qwen/Qwen-72B), 及各种量化和Chat版本. [Chat Demo](https://huggingface.co/spaces/Qwen/Qwen-72B-Chat-Demo)
  - XVERSE series - Multilingual large language model developed by XVERSE Technology Inc | 由深圳元象科技自主研发的支持多语言的大语言模型. 包括[7B](https://github.com/xverse-ai/XVERSE-7B), [13B](https://github.com/xverse-ai/XVERSE-13B), [65B](https://github.com/xverse-ai/XVERSE-65B)等.
  - RAFT - RAFT: A new way to teach LLMs to be better at RAG ([paper](https://arxiv.org/abs/2403.10131)).
  - BLOOM - BigScience Large Open-science Open-access Multilingual Language Model [BLOOM-LoRA](https://github.com/linhduongtuan/BLOOM-LORA)
  - Command-R series - Two multilingual large language models intended for retrieval augmented generation (RAG) and conversational use, at [35](https://huggingface.co/CohereForAI/c4ai-command-r-v01) and [104](https://huggingface.co/CohereForAI/c4ai-command-r-plus) billion parameters. 128k context support.
LLM Leaderboard
- Chatbot Arena Leaderboard - a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
- LiveBench - A Challenging, Contamination-Free LLM Benchmark.
- AlpacaEval - An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
- OpenCompass 2.0 LLM Leaderboard - OpenCompass is an LLM evaluation platform, supporting a wide range of models (InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
- Open LLM Leaderboard - aims to track, rank, and evaluate LLMs and chatbots as they are released.
- ACLUE - an evaluation benchmark focused on ancient Chinese language comprehension.
- BeHonest - A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.
- Chinese Large Model Leaderboard - an expert-driven benchmark for Chineses LLMs.
- CompassRank - CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.
- CompMix - a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes).
- DreamBench++ - a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.
- FELM - a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).
- InfiBench - a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.
- MMedBench - a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.
- LawBench - a benchmark designed to evaluate large language models in the legal domain.
- LLMEval - focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.
- M3CoT - a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural and social sciences, physical and social commonsense, temporal reasoning, algebra, and geometry.
- MathEval - a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.
- MMToM-QA - a multimodal question-answering benchmark designed to evaluate AI models' cognitive ability to understand human beliefs and goals.
- VisualWebArena - a benchmark designed to assess the performance of multimodal web agents on realistic visually grounded tasks.
- We-Math - a benchmark that evaluates large multimodal models (LMMs) on their ability to perform human-like mathematical reasoning.
- WHOOPS! - a benchmark dataset testing AI's ability to reason about visual commonsense through images that defy normal expectations.
- OlympicArena - a benchmark for evaluating AI models across multiple academic disciplines like math, physics, chemistry, biology, and more.
- SuperLim - a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such as argumentation analysis, semantic similarity, and textual entailment.
- TAT-DQA - a large-scale Document Visual Question Answering (VQA) dataset designed for complex document understanding, particularly in financial reports.
- TAT-QA - a large-scale question-answering benchmark focused on real-world financial data, integrating both tabular and textual information.
- SciBench - benchmark designed to evaluate large language models (LLMs) on solving complex, college-level scientific problems from domains like chemistry, physics, and mathematics.
- SuperBench - a benchmark platform designed for evaluating large language models (LLMs) on a range of tasks, particularly focusing on their performance in different aspects such as natural language understanding, reasoning, and generalization.
- Berkeley Function-Calling Leaderboard - evaluates LLM's ability to call external functions/tools.
- MixEval - a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU).
- CompassRank - CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.
LLM Training Frameworks
- RLHF LLM
  - FairScale - FairScale is a PyTorch extension library for high performance and large scale training.
- - torchtune - A Native-PyTorch Library for LLM Fine-tuning.
  - torchtitan - A native PyTorch Library for large model training.
  - Megatron-LM - Ongoing research training transformer models at scale.
  - Colossal-AI - Making large AI models cheaper, faster, and more accessible.
  - BMTrain - Efficient Training for Big Models.
  - Mesh Tensorflow - Mesh TensorFlow: Model Parallelism Made Easier.
  - maxtext - A simple, performant and scalable Jax LLM!
  - GPT-NeoX - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
  - Alpa - Alpa is a system for training and serving large-scale neural networks.
LLM Deployment
- Haystack - an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data.
- llm-inference-solutions
- vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs.
- exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
- llama.cpp - LLM inference in C/C++.
- ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
- Langfuse - Open Source LLM Engineering Platform 🪢 Tracing, Evaluations, Prompt Management, Evaluations and Playground.
- FastChat - A distributed multi-model LLM serving system with web UI and OpenAI-compatible RESTful APIs.
- mistral.rs - Blazingly fast LLM inference.
- MindSQL - A python package for Txt-to-SQL with self hosting functionalities and RESTful APIs compatible with proprietary as well as open source LLM.
- SkyPilot - Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
- QA-Pilot - An interactive chat project that leverages Ollama/OpenAI/MistralAI LLMs for rapid understanding and navigation of GitHub code repository or compressed file resources.
- Shell-Pilot - Interact with LLM using Ollama models(or openAI, mistralAI)via pure shell scripts on your Linux(or MacOS) system, enhancing intelligent system management without any dependencies.
- Floom
- Swiss Army Llama - Comprehensive set of tools for working with local LLMs for various tasks.
- magentic - Seamlessly integrate LLMs as Python functions
- wechat-chatgpt - Use ChatGPT On Wechat via wechaty
- Agenta - Easily build, version, evaluate and deploy your LLM-powered apps.
- Serge - a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!
- Langroid - Harness LLMs with Multi-Agent Programming
- IntelliServer - simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.
- OpenLLM - Fine-tune, serve, deploy, and monitor any open-source LLMs in production. Used in production at [BentoML](https://bentoml.com/) for LLMs-based applications.
- Text-Embeddings-Inference - Inference for text-embeddings in Rust, HFOIL Licence.
- Infinity - Inference for text-embeddings in Python
- TensorRT-LLM - Nvidia Framework for LLM Inference
- FasterTransformer - NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
- Flash-Attention - A method designed to enhance the efficiency of Transformer models
- Langchain-Chatchat - Formerly langchain-ChatGLM, local knowledge based LLM (like ChatGLM) QA app with langchain.
- Search with Lepton - Build your own conversational search engine using less than 500 lines of code by [LeptonAI](https://github.com/leptonai).
- Robocorp - Create, deploy and operate Actions using Python anywhere to enhance your AI agents and assistants. Batteries included with an extensive set of libraries, helpers and logging.
- LMDeploy - A high-throughput and low-latency inference and serving framework for LLMs and VLs
- LLocalSearch - Locally running websearch using LLM chains
- AI Gateway - ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency.
- talkd.ai dialog - Simple API for deploying any RAG or LLM that you want adding plugins.
- Wllama - WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
- GPUStack - An open-source GPU cluster manager for running LLMs
- SGLang - SGLang is a fast serving framework for large language models and vision language models.
- Sidekick - Data integration platform for LLMs.
- Opik - Confidently evaluate, test, and ship LLM applications with a suite of observability tools to calibrate language model outputs across your dev and production lifecycle.
- TGI - a toolkit for deploying and serving Large Language Models (LLMs).
Deploying Tools
- RLHF LLM
  - Sidekick - Data integration platform for LLMs.
  - LiteChain - Lightweight alternative to LangChain for composing LLMs
  - promptfoo - Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality.
  - Tune Studio - Playground for devs to finetune & deploy LLMs
Prompting libraries & tools
- RLHF LLM
  - LangChain
  - Guidance
  - Chainlit
  - Guardrails.ai
  - Outlines - specific language to simplify prompting and constrain generation.
  - Scale Spellbook
  - Weights & Biases
  - LlamaIndex
  - LMQL
LLM Applications
- FLAML (A Fast Library for Automated Machine Learning & Tuning)
- PromptPerfect
- Arthur Shield
- GPTRouter - GPTRouter is an open source LLM API Gateway that offers a universal API for 30+ LLMs, vision, and image models, with smart fallbacks based on uptime and latency, automatic retries, and streaming. Stay operational even when OpenAI is down
- AdalFlow - AdalFlow: The PyTorch library for LLM applications.
- dspy - DSPy: The framework for programming—not prompting—foundation models.
- YiVal - source GenAI-Ops tool for tuning and evaluating prompts, configurations, and model parameters using customizable datasets, evaluation methods, and improvement strategies.
- Evidently - source framework to evaluate, test and monitor ML and LLM-powered systems.
- Semantic Kernel
- Prompttools - source Python tools for testing and evaluating models, vector DBs, and prompts.
- Promptify
- OpenAI Evals - source library for evaluating task performance of language models and prompts.
- Flappy - Ready LLM Agent SDK for Every Developer.
- QAnything - A local knowledge base question-answering system designed to support a wide range of file formats and databases.
- llm-ui - A React library for building LLM UIs.
- Dify - An open-source LLM app development platform with an intuitive interface that streamlines AI workflows, model management, and production deployment.
- LazyLLM - An open-source LLM app for building multi-agent LLMs applications in an easy and lazy way, supports model deployment and fine-tuning.
- MemFree - Open Source Hybrid AI Search Engine, Instantly Get Accurate Answers from the Internet, Bookmarks, Notes, and Docs. Support One-Click Deployment
- OneKE - English knowledge extraction model with knowledge graphs and natural language processing technologies.
- unslothai - A framework that specializes in efficient fine-tuning. On its GitHub page, you can find ready-to-use fine-tuning templates for various LLMs, allowing you to easily train your own data for free on the Google Colab cloud.
- Weights & Biases
- Wallaroo.AI - Deploy, manage, optimize any model at scale across any environment from cloud to edge. Let's you go from python notebook to inferencing in minutes.
Tutorials
- RLHF LLM
Great thoughts about LLM
- Why did all of the public reproduction of GPT-3 fail?
- A Stage Review of Instruction Tuning
- LLM Powered Autonomous Agents
- Why you should work on AI AGENTS!
- Google "We Have No Moat, And Neither Does OpenAI"
- AI competition statement
- Prompt Engineering
- Large Language Model Training in 2023
- The Next Generation Of Large Language Models
- Noam Chomsky: The False Promise of ChatGPT
- Is ChatGPT 175 Billion Parameters? Technical Analysis
- How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources
- Scaling, emergence, and reasoning in large language models
LLM Tutorials and Courses
- ChatGPT Prompt Engineering
- Princeton: Understanding Large Language Models
- CS25-Transformers United
- llm-course - Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- CS324 - Large Language Models
- femtoGPT - Pure Rust implementation of a minimal Generative Pretrained Transformer.
- UWaterloo CS 886 - Recent Advances on Foundation Models.
- State of GPT
- Let's build GPT: from scratch, in code, spelled out.
- Neurips2022-Foundational Robustness of Foundation Models
- ICML2022-Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models
Courses
- RLHF LLM
  - OpenBMB
  - Stanford
  - Stanford Webinar
  - Aston Zhang - 7UM8iUTj3qKqdhbQULP5I&index=29)
  - MIT
  - 李沐
  - 陳縕儂
  - 李沐
  - 李沐 - 7UM8iUTj3qKqdhbQULP5I&index=18)
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - DeepLearning.AI
  - Arize
  - Arize
Books
- RLHF LLM
  - Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs - it comes with a [GitHub repository](https://github.com/benman1/generative_ai_with_langchain) that showcases a lot of the functionality
Opinions
- RLHF LLM
  - 我的大模型世界观 - 04-23] [陆奇]
  - Towards ChatGPT and Beyond - 02-20][知乎][欧泽彬]
  - 追赶ChatGPT的难点与平替 - 02-19][李rumor]
  - ChatGPT发展历程、原理、技术架构详解和产业未来 - 02-15][知乎][陈巍谈芯]
  - 对ChatGPT的二十点看法 - 02-13]\[知乎][熊德意]
  - ChatGPT-所见、所闻、所感 - 02-11]\[知乎][刘聪NLP]
  - 对话旷视研究院张祥雨｜ChatGPT的科研价值可能更大 - 02-16][知乎][旷视科技]
  - 关于ChatGPT八个技术问题的猜想 - 02-15][知乎][张家俊]
  - Large Language Models: A New Moore's Law - 10-26\]\[Huggingface\]
Miscellaneous
- Arize-Phoenix - Open-source tool for ML observability that runs in your notebook environment. Monitor and fine tune LLM, CV and Tabular Models.
- Major LLMs + Data Availability
- 500+ Best AI Tools
- Open-evals - A framework extend openai's [Evals](https://github.com/openai/evals) for different language model.
- OpenAGI - When LLM Meets Domain Experts.
- EasyEdit - An easy-to-use framework to edit large language models.
- chatgpt-shroud - A Chrome extension for OpenAI's ChatGPT, enhancing user privacy by enabling easy hiding and unhiding of chat history. Ideal for privacy during screen shares.
Other Useful Resources
- RLHF LLM
  - Emergent Mind - The latest AI news, curated & explained by GPT-4.
  - ShareGPT - Share your wildest ChatGPT conversations with one click.
  - chatgpt-wrapper - ChatGPT Wrapper is an open-source unofficial Python API and CLI that lets you interact with ChatGPT.
  - AutoGPT - an experimental open-source application showcasing the capabilities of the GPT-4 language model.
  - MTEB - Massive Text Embedding Benchmark Leaderboard
  - Cohere Summarize Beta - Introducing Cohere Summarize Beta: A New Endpoint for Text Summarization
Other Papers
- LLM Reading List - A paper & resource list of large language models.
- Reasoning using Language Models - Collection of papers and resources on Reasoning using Language Models.
- Instruction-Tuning-Papers - A trend starts from `Natrural-Instruction` (ACL 2022), `FLAN` (ICLR 2022) and `T0` (ICLR 2022).
- Chain-of-Thought Hub - Measuring LLMs' Reasoning Performance
- Awesome-LLM-Inference - A curated list of Awesome LLM Inference Paper with codes.
- awesome-chatgpt-prompts-zh - A Chinese collection of prompt examples to be used with the ChatGPT model.
- Awesome ChatGPT - Curated list of resources for ChatGPT and GPT-3 from OpenAI.
- Chain-of-Thoughts Papers - A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models.
- Awesome Deliberative Prompting - How to ask LLMs to produce reliable reasoning and make reason-responsive decisions.
- Awesome-LLM-hallucination - LLM hallucination paper list.
- awesome-hallucination-detection - List of papers on hallucination detection in LLMs.
- LLMsPracticalGuide - A curated list of practical guide resources of LLMs
- Awesome ChatGPT Prompts - A collection of prompt examples to be used with the ChatGPT model.
- Awesome GPT - A curated list of awesome projects and resources related to GPT, ChatGPT, OpenAI, LLM, and more.
- Awesome GPT-3 - a collection of demos and articles about the [OpenAI GPT-3 API](https://openai.com/blog/openai-api/).
- RWKV-howto - possibly useful materials and tutorial for learning RWKV.
- Awesome LLM Security - A curation of awesome tools, documents and projects about LLM Security.
- Awesome-Align-LLM-Human - A collection of papers and resources about aligning large language models (LLMs) with human.
- Awesome-Code-LLM - An awesome and curated list of best code-LLM for research.
- Awesome-LLM-Compression - Awesome LLM compression research papers and tools.
- Awesome-LLM-Systems - Awesome LLM systems research papers.
- awesome-llm-webapps - A collection of open source, actively maintained web apps for LLM applications.
- awesome-japanese-llm - 日本語LLMまとめ - Overview of Japanese LLMs.
- Awesome-LLM-Healthcare - The paper list of the review on LLMs in medicine.
- Awesome-LLM-3D - A curated list of Multi-modal Large Language Model in 3D world, including 3D understanding, reasoning, generation, and embodied agents.
- LLM4Opt - Applying Large language models (LLMs) for diverse optimization tasks (Opt) is an emerging research area. This is a collection of references and papers of LLM4Opt.
- Awesome-Chinese-LLM - 整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。
- awesome-language-model-analysis - This paper list focuses on the theoretical or empirical analysis of language models, e.g., the learning dynamics, expressive capacity, interpretability, generalization, and other interesting topics.
- LLMDatahub - a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset
Trending LLM Projects
- LibreChat - All-In-One AI Conversations with LibreChat.
- Open-Sora - Democratizing Efficient Video Production for All.
- DeepSeek-V3 - First open-sourced GPT-4o level model.
- OpenAI o3 preview - AGI, maybe?
- Qwen2.5 Technical Report - This report introduces Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs.
- Genesis - A generative world for general-purpose robotics & embodied AI learning.
- ModernBERT - Bringing BERT into modernity via both architecture changes and scaling.
- LLM101n - Let's build a Storyteller.
- miqu-1-70b - A leaked 70B model from Mistral AI.
- Sora - Sora is an AI model that can create realistic and imaginative scenes from text instructions.
- Deep-Live-Cam - real time face swap and one-click video deepfake with only a single image (uncensored).
- GPT-SoVITS - 1 min voice data can also be used to train a good TTS model! (few shot voice cloning).
- GPT-4o - OpenAI's new flagship model that can reason across audio, vision, and text in real time.
- TinyZero - Clean, minimal, accessible reproduction of DeepSeek R1-Zero
- open-r1 - Fully open reproduction of DeepSeek-R1
- DeepSeek-R1 - First-generation reasoning models from DeepSeek.
- Qwen2.5-Max - Exploring the Intelligence of Large-scale MoE Model.
- OpenAI o3-mini - Pushing the frontier of cost-effective reasoning.
LLM Evaluation:
- lm-evaluation-harness - A framework for few-shot evaluation of language models.
- MixEval - A reliable click-and-go evaluation suite compatible with both open-source and proprietary models, supporting MixEval and other benchmarks.
- lighteval - a lightweight LLM evaluation suite that Hugging Face has been using internally.
- OLMO-eval - a repository for evaluating open language models.
- instruct-eval - This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
- simple-evals - Eval tools by OpenAI.
- Giskard - Testing & evaluation library for LLM applications, in particular RAGs
- Ragas - a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.
- LangSmith - a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.
LLM Books
- Build a Large Language Model (From Scratch) - A guide to building your own working LLM.
- BUILD GPT: HOW AI WORKS - explains how to code a Generative Pre-trained Transformer, or GPT, from scratch.
- Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs - it comes with a [GitHub repository](https://github.com/benman1/generative_ai_with_langchain) that showcases a lot of the functionality
LLM Data
- IBM data-prep-kit - Open-Source Toolkit for Efficient Unstructured Data Processing with Pre-built Modules and Local to Cluster Scalability.

Programming Languages

Python 60 TypeScript 11 Jupyter Notebook 8 JavaScript 6 Rust 4 C++ 3 C# 2 Go 2 CSS 1 Svelte 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-LLM

Milestone Papers

Open LLM

Instruction finetuned LLM

RLHF LLM

Open LLM -->

RLHF LLM

LLM Leaderboard

LLM Training Frameworks

RLHF LLM

LLM Deployment

Deploying Tools

RLHF LLM

Prompting libraries & tools

RLHF LLM

LLM Applications

Tutorials

RLHF LLM

Great thoughts about LLM

LLM Tutorials and Courses

Courses

RLHF LLM

Books

RLHF LLM

Opinions

RLHF LLM

Miscellaneous

Other Useful Resources

RLHF LLM

Other Papers

LLM Evaluation:

LLM Books

LLM Data

Awesome-LLM

Milestone Papers

Open LLM

Instruction finetuned LLM

RLHF LLM

Open LLM -->

RLHF LLM

LLM Leaderboard

LLM Training Frameworks

RLHF LLM

LLM Deployment

Deploying Tools

RLHF LLM

Prompting libraries & tools

RLHF LLM

LLM Applications

Tutorials

RLHF LLM

Great thoughts about LLM

LLM Tutorials and Courses

Courses

RLHF LLM

Books

RLHF LLM

Opinions

RLHF LLM

Miscellaneous

Other Useful Resources

RLHF LLM

Other Papers

Trending LLM Projects

LLM Evaluation:

LLM Books

LLM Data