Awesome-LLMOps
🎉 An awesome & curated list of best LLMOps tools.
https://github.com/InftyAI/Awesome-LLMOps
Last synced: 4 days ago
JSON representation
-
Inference
-
Inference Engine
- DeepSpeed-MII - latency and high-throughput inference possible, powered by DeepSpeed.   
- ipex-llm - VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.   
- LMDeploy - commit/internlm/lmdeploy?color=green)
- llama.cpp - commit/ggerganov/llama.cpp?color=green)
- MInference - context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.    
- MLC LLM - ai/mlc-llm.svg?style=flat&color=green)  
- Ollama - R1, Phi-4, Gemma 3, and other large language models.   
- Ratchet - platform browser ML framework.    
- MLServer - model serving and more.   
- SGLang - project/sglang.svg?style=flat&color=green)  
- Triton Inference Server - inference-server/server.svg?style=flat&color=green)  
- vLLM - throughput and memory-efficient inference and serving engine for LLMs.   
- zml - commit/zml/zml?color=green)
- Text Generation Inference - generation-inference.svg?style=flat&color=green)  
- web-llm - performance In-browser LLM Inference Engine.    
- Nvidia Dynamo - dynamo/dynamo.svg?style=flat&color=green)  
- Llumnix - instance LLM serving.   
- OpenVINO - commit/openvinotoolkit/openvino?color=green)
- SGLang - project/sglang.svg?style=flat&color=green)  
- transformers.js - of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!    
- web-llm - performance In-browser LLM Inference Engine.    
- LoRAX - LoRA inference server that scales to 1000s of fine-tuned LLMs.    
- Cortex.cpp - commit/janhq/cortex.cpp?color=green)
-
Inference Platform
- llmaz - commit/inftyai/llmaz?color=green)
- OpenLLM - source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.   
- Kserve - commit/kserve/kserve?color=green)
- Mooncake - ai/mooncake.svg?style=flat&color=green)  
- LMCache - Context LLM By Smart KV Cache Optimizations.    
- AIBrix - efficient and pluggable Infrastructure components for GenAI inference.   
- KubeAI - to-text.   
- Kaito - model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration.   
-
- LMDeploy
- MaxText
- Inference - to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models. | vision |
- ipex-llm - analytics/ipex-llm.svg) |  |  | Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc. | device |
- llmaz - of-the-art LLMs on Kubernetes. | |
- Nanoflow - oriented high-performance serving framework for LLMs |
- llama.cpp
- MInference - context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. | |
- MLC LLM - ai/mlc-llm.svg) |  |  | Universal LLM Deployment Engine with ML Compilation | |
- Nanoflow - oriented high-performance serving framework for LLMs | |
- Ollama
- OpenLLM
- Ratchet - platform browser ML framework. | browser |
- RayServe - project/ray.svg) |  |  | Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. | |
- RouteLLM - sys/routellm.svg) |  |  | A framework for serving and evaluating LLM routers - save LLM costs without compromising quality. | cost |
- TensorRT-LLM - LLM) |  |  | TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.||
- TensorRT-LLM - LLM) |  |  | TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.||
- vLLM - project/vllm.svg) |  |  | A high-throughput and memory-efficient inference and serving engine for LLMs | |
- zml
- Triton Inference Server - inference-server/server.svg) </br>  </br>  | The Triton Inference Server provides an optimized cloud and edge inferencing solution. |
-
-
Outputs
-
MLOps
-
MCP Client
- BentoML - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!   
- Flyte - commit/flyteorg/flyte?color=green)
- Kubeflow - commit/kubeflow/kubeflow?color=green)
- MLflow - commit/mlflow/mlflow?color=green)
- Metaflow - commit/netflix/metaflow?color=green)
- ZenML - io/zenml.svg?style=flat&color=green)  
- Ray - project/ray.svg?style=flat&color=green)  
- Seldon-Core - core.svg?style=flat&color=green)  
- Polyaxon - commit/polyaxon/polyaxon?color=green)
-
- Flyte
- Kubeflow
- MLflow
- ZenML - io/zenml.svg) |  |  | ZenML 🙏: Build portable, production-ready MLOps pipelines. <https://zenml.io>. | |
- Metaflow - life data science projects with ease! |
- Seldon-Core - core.svg) </br>  </br>  | An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models. |
-
-
Application Orchestration Framework
-
Tools
- Dify - source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.   
- Flowise - commit/flowiseai/flowise?color=green)
- Haystack - ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.   
- LangChain - aware reasoning applications.   
- LlamaIndex - powered agents over your data.   
- Inference - commit/roboflow/inference?color=green) 
- Haystack - ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.   
- LangChain - aware reasoning applications.   
- LightRAG - Augmented Generation"   
- Semantic Kernel - source integration framework for integrating LLMs into your applications, featuring plugin integration, memory management, planners, and multi-modal capabilities.   
-
-
Chat Framework
-
Tools
- FastChat - sys/fastchat.svg?style=flat&color=green)  
- PrivateGPT - ai/private-gpt.svg?style=flat&color=green)  
- Open WebUI - friendly AI Interface (Supports Ollama, OpenAI API, ...).   
- NextChat - commit/chatgptnextweb/nextchat?color=green)
- 5ire - platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers.   
- Chatbot UI - ui.svg?style=flat&color=green)  
- Cherry Studio - r1.   
- Gradio - app/gradio.svg?style=flat&color=green)  
- Jan - commit/janhq/jan?color=green)
- Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.   
-
-
Gateway
-
LLM Router
- LiteLLM - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].   
- RouteLLM - save LLM costs without compromising quality.   
- AI Gateway - ai/gateway.svg?style=flat&color=green)  
-
API Gateway
- APISIX - Native API Gateway and AI Gateway with extensive plugin system and AI capabilities.   
- Envoy AI Gateway - gateway.svg?style=flat&color=green)  
- Higress - commit/alibaba/higress?color=green)
- kgateway - Native API Gateway and AI Gateway.   
- Kong - Native API Gateway and AI Gateway.   
- gateway-api-inference-extension - sigs/gateway-api-inference-extension.svg?style=flat&color=green)  
-
-
Agent
-
Tools
- Mem0 - commit/mem0ai/mem0?color=green)
- Browser Use - use/browser-use.svg?style=flat&color=green)  
- OpenAI CUA - cua-sample-app.svg?style=flat&color=green)  
-
Framework
- AutoGPT - gravitas/autogpt.svg?style=flat&color=green)  
- MetaGPT - Agent Framework: First AI Software Company, Towards Natural Language Programming.   
- PydanticAI - ai.svg?style=flat&color=green)  
- Swarm - agent orchestration. Managed by OpenAI Solution team.    
- Agno - agnostic.   
- LangGraph - ai/langgraph.svg?style=flat&color=green)  
- MetaGPT - Agent Framework: First AI Software Company, Towards Natural Language Programming.   
- OpenAI Agents SDK - agent workflows.   
- OpenManus - commit/mannaandpoem/openmanus?color=green)
- PydanticAI - ai.svg?style=flat&color=green)  
- Swarm - agent orchestration. Managed by OpenAI Solution team.    
- CAMEL - agent framework. Finding the Scaling Law of Agents.   
- kagent - dev/kagent.svg?style=flat&color=green)  
-
-
LLMOps
- BentoML - Grade AI Applications | |
- Dify
- FastChat - sys/fastchat.svg) |  |  | An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. | |
- Flowise
- LiteLLM - Azure, OpenAI, Cohere, Anthropic, Replicate. Manages input/output translation | |
- Mem0
- LLaMa-Factory - factory.svg) </br>  </br>  | Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM) |
- LlamaIndex - llama/llama_index.svg) </br>  </br>  | LlamaIndex is a data framework for your LLM applications |
- PrivateGPUT - ai/private-gpt.svg) </br>  </br>  | Interact with your documents using the power of GPT, 100% privately, no data leaks |
- Swift - swift) </br>  </br>  | SWIFT supports training(PreTraining/Fine-tuning/RLHF), inference, evaluation and deployment of 350+ LLMs and 90+ MLLMs (multimodal large models). | |
-
FineTune
-
Tools
- LLaMa-Factory - Tuning of 100+ LLMs & VLMs (ACL 2024).   
- Swift - parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).   
- torchtune - training library.   
- unsloth - R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥   
- Axolotl - ai-cloud/axolotl.svg?style=flat&color=green)  
- EasyLM - training, finetuning, evaluating and serving LLMs in JAX/Flax.   
- LMFlow - commit/optimalscale/lmflow?color=green)
- MLX-VLM - VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.   
- Transformer Lab - tune, and evaluate large language models on your own computer.   
- maestro - tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.   
-
- torchtune - PyTorch Library for LLM Fine-tuning | |
- unsloth - 5x faster with 80% less memory | |
- Axolotl - ai-cloud/axolotl.svg) </br>  </br>  | Go ahead and axolotl questions |
-
-
Training
-
MCP Client
- MaxText - commit/google/maxtext?color=green)
- ColossalAI - commit/hpcaitech/ColossalAI?color=green)
- Ludwig - code framework for building custom LLMs, neural networks, and other AI models.   
- MLX - explore/mlx.svg?style=flat&color=green)  
- Candle - commit/huggingface/candle?color=green)
-
- ColossalAI
- Ludwig - ai/ludwig.svg) |  |  | Low-code framework for building custom LLMs, neural networks, and other AI models | |
-
Inference Platform
- MLX - explore/mlx.svg?style=flat&color=green)  
-
-
Evaluation
-
Tools
- AgentBench - commit/thudm/agentbench?color=green)
- lm-evaluation-harness - shot evaluation of language models.   
- LongBench - commit/thudm/longbench?color=green)
- lm-evaluation-harness - shot evaluation of language models.   
- OpenCompass - 4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.   
-
-
Database
-
Tools
- Faiss - commit/facebookresearch/faiss?color=green)
- weaviate - source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.   
- chroma - native open-source embedding database.   
- deeplake - time to PyTorch/TensorFlow.   
- milvus - performance, cloud-native vector database built for scalable vector ANN search.   
- deeplake - time to PyTorch/TensorFlow.   
- weaviate - source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.   
-
-
DB Store
- Faiss
- chroma - core/chroma.svg) </br>  </br>  | the AI-native open-source embedding database |
- milvus - io/milvus.svg) </br>  </br>  | A cloud-native vector database, storage for next generation AI applications |
-
Observation
-
MCP Client
- OpenLLMetry - source observability for your LLM application, based on OpenTelemetry.   
- phoenix - ai/phoenix.svg?style=flat&color=green)  
- Helicone - commit/helicone/helicone?color=green)
- wandb - tune models, and manage models from experimentation to production.   
- wandb - tune models, and manage models from experimentation to production.   
-
- Helicone AI - source LangSmith alternative for logging, monitoring, and debugging AI applications.|
- OpenLLMetry - source observability for your LLM application, based on OpenTelemetry |
- Helicone AI - source LangSmith alternative for logging, monitoring, and debugging AI applications.|
-
Inference Platform
- phoenix - ai/phoenix.svg?style=flat&color=green)  
-
-
Alignment
-
Tools
- OpenRLHF - to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).   
- Self-RLHF - alignment/safe-rlhf.svg?style=flat&color=green)  
- OpenRLHF - to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).   
- Self-RLHF - alignment/safe-rlhf.svg?style=flat&color=green)  
-
-
Code Assistant
-
Tools
- Auto-dev - powered coding wizard(AI 驱动编程助手)with multilingual support 🌐, auto code generation 🏗️, and a helpful bug-slaying assistant 🐞! Customizable prompts 🎨 and a magic Auto Dev/Testing/Document/Agent feature 🧪 included! 🚀.   
- Codefuse-chatbot - Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.   
- Cody - commit/sourcegraph/cody?color=green)
- Continue - source IDE extensions and hub of models, rules, prompts, docs, and other building blocks.   
- Sweep - commit/sweepai/sweep?color=green)
- Tabby - hosted AI coding assistant.   
-
-
Output
-
MCP Client
- Instructor - ai/instructor.svg?style=flat&color=green)  
- Outlines - ai/outlines.svg?style=flat&color=green)  
-
-
MCP
-
MCP Client
- awesome-mcp-clients - mcp-clients.svg?style=flat&color=green)  
-
MCP Server
- awesome-mcp-servers - mcp-servers.svg?style=flat&color=green)  
- mcp-directory - directory.svg?style=flat&color=green)  
-
Programming Languages
Categories
Sub Categories
Keywords
llm
69
ai
40
machine-learning
32
python
29
deep-learning
26
llama
25
pytorch
24
llmops
23
mlops
22
openai
21
gpt
19
llms
18
kubernetes
16
rag
16
data-science
16
large-language-models
16
inference
15
chatgpt
15
ml
15
fine-tuning
14
transformers
13
langchain
13
llama3
13
tensorflow
12
language-model
12
agent
11
transformer
11
llm-serving
11
mistral
10
vector-database
10
llama2
10
agents
9
artificial-intelligence
9
generative-ai
9
gpt-4
9
llm-inference
9
lora
8
deepseek
8
prompt-engineering
7
workflow
7
reinforcement-learning
7
computer-vision
7
qwen
6
deepseek-r1
6
deployment
6
model-serving
6
ollama
6
vector-search
6
open-source
6
datasets
6