Awesome-LLMOps
🎉 An awesome & curated list of best LLMOps tools.
https://github.com/InftyAI/Awesome-LLMOps
Last synced: 6 days ago
JSON representation
-
Inference
-
Inference Engine
- DeepSpeed-MII - latency and high-throughput inference possible, powered by DeepSpeed.   
- ipex-llm - VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.   
- LMDeploy - commit/internlm/lmdeploy?color=green)
- llama.cpp - commit/ggerganov/llama.cpp?color=green)
- MInference - context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.    
- MLC LLM - ai/mlc-llm.svg?style=flat&color=green)  
- Ollama - R1, Phi-4, Gemma 3, and other large language models.   
- OpenLLM - source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.   
- Ratchet - platform browser ML framework.    
- MLServer - model serving and more.   
- SGLang - project/sglang.svg?style=flat&color=green)  
- Triton Inference Server - inference-server/server.svg?style=flat&color=green)  
- vLLM - throughput and memory-efficient inference and serving engine for LLMs.   
- zml - commit/zml/zml?color=green)
- Text Generation Inference - generation-inference.svg?style=flat&color=green)  
- web-llm - performance In-browser LLM Inference Engine.    
- TinyGrad - commit/tinygrad/tinygrad?color=green)
- Xinference - source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.   
- Nvidia Dynamo - dynamo/dynamo.svg?style=flat&color=green)  
- Llumnix - instance LLM serving.   
- OpenVINO - commit/openvinotoolkit/openvino?color=green)
- SGLang - project/sglang.svg?style=flat&color=green)  
- transformers.js - of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!    
- web-llm - performance In-browser LLM Inference Engine.    
- LoRAX - LoRA inference server that scales to 1000s of fine-tuned LLMs.    
- Cortex.cpp - commit/janhq/cortex.cpp?color=green)
-
Benchmark
- Inference Benchmark - Hypercomputer/inference-benchmark.svg?style=flat&color=green)  
- Inference Perf - sigs/inference-perf.svg?style=flat&color=green)  
-
LLM Router
- LiteLLM - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].   
- RouteLLM - save LLM costs without compromising quality.   
- AI Gateway - ai/gateway.svg?style=flat&color=green)  
-
Inference Platform
- llmaz - commit/inftyai/llmaz?color=green)
- llm-d - d is a Kubernetes-native high-performance distributed LLM inference framework   
- Kserve - commit/kserve/kserve?color=green)
- Mooncake - ai/mooncake.svg?style=flat&color=green)  
- LMCache - Context LLM By Smart KV Cache Optimizations.    
- AIBrix - efficient and pluggable Infrastructure components for GenAI inference.   
- KubeAI - to-text.   
- Kaito - model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration.   
-
- LMDeploy
- MaxText
- Inference - to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models. | vision |
- ipex-llm - analytics/ipex-llm.svg) |  |  | Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc. | device |
- llmaz - of-the-art LLMs on Kubernetes. | |
- Nanoflow - oriented high-performance serving framework for LLMs |
- llama.cpp
- MInference - context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. | |
- MLC LLM - ai/mlc-llm.svg) |  |  | Universal LLM Deployment Engine with ML Compilation | |
- Nanoflow - oriented high-performance serving framework for LLMs | |
- Ollama
- OpenLLM
- Ratchet - platform browser ML framework. | browser |
- RayServe - project/ray.svg) |  |  | Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. | |
- RouteLLM - sys/routellm.svg) |  |  | A framework for serving and evaluating LLM routers - save LLM costs without compromising quality. | cost |
- TensorRT-LLM - LLM) |  |  | TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.||
- TensorRT-LLM - LLM) |  |  | TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.||
- vLLM - project/vllm.svg) |  |  | A high-throughput and memory-efficient inference and serving engine for LLMs | |
- zml
- Triton Inference Server - inference-server/server.svg) </br>  </br>  | The Triton Inference Server provides an optimized cloud and edge inferencing solution. |
-
AI Gateway
- Kong - Native API Gateway and AI Gateway.   
- gateway-api-inference-extension - sigs/gateway-api-inference-extension.svg?style=flat&color=green)  
- APISIX - Native API Gateway and AI Gateway with extensive plugin system and AI capabilities.   
- Envoy AI Gateway - gateway.svg?style=flat&color=green)  
- Higress - commit/alibaba/higress?color=green)
- kgateway - Native API Gateway and AI Gateway.   
-
Output
- Instructor - ai/instructor.svg?style=flat&color=green)  
- Outlines - ai/outlines.svg?style=flat&color=green)  
-
-
Orchestration
-
Agent
- Qwen-Agent - Agent.svg?style=flat&color=green)  
- LangChain - aware reasoning applications.   
- LlamaIndex - powered agents over your data.   
- AutoGPT - gravitas/autogpt.svg?style=flat&color=green)  
- autogen - agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour   
- SWE-agent - agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]   
- fast-agent - agent.svg?style=flat&color=green)  
- Magentic-UI - centered web agent   
- crewAI - playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.   
- MetaGPT - Agent Framework: First AI Software Company, Towards Natural Language Programming.   
- PydanticAI - ai.svg?style=flat&color=green)  
- Agno - agnostic.   
- LangGraph - ai/langgraph.svg?style=flat&color=green)  
- OpenManus - commit/mannaandpoem/openmanus?color=green)
- Swarm - agent orchestration. Managed by OpenAI Solution team.    
- Semantic Kernel - edge LLM technology quickly and easily into your apps.   
- CAMEL - agent framework. Finding the Scaling Law of Agents.   
- kagent - dev/kagent.svg?style=flat&color=green)   
- Agent Development Kit (ADK) - source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.   
- Codex - commit/openai/codex?color=green) 
- OpenAI Agents SDK - agent workflows.   
- Suna - Open Source Generalist AI Agent   
- Swarm - agent orchestration. Managed by OpenAI Solution team.    
-
Workflow
- FastGPT - based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.   
- Dify - source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.   
- Flowise - commit/flowiseai/flowise?color=green)
- Haystack - ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.   
- Inference - commit/roboflow/inference?color=green) 
-
Tools
- Mem0 - commit/mem0ai/mem0?color=green)
- Browser Use - use/browser-use.svg?style=flat&color=green)  
- Graphiti - Time Knowledge Graphs for AI Agents.   
- OpenAI CUA - cua-sample-app.svg?style=flat&color=green)  
-
RAG
- GraphRAG - based Retrieval-Augmented Generation (RAG) system.   
- RAGFlow - source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.   
- LightRAG - Augmented Generation"   
-
-
Outputs
-
Runtime
-
Chatbot
- FastChat - sys/fastchat.svg?style=flat&color=green)  
- AnythingLLM - in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.   
- kubectl-ai - ai.svg?style=flat&color=green)  
- LLM - line   
- PrivateGPT - ai/private-gpt.svg?style=flat&color=green)  
- NextChat - commit/chatgptnextweb/nextchat?color=green)
- 5ire - platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers.   
- Chatbot UI - ui.svg?style=flat&color=green)  
- Cherry Studio - r1.   
- Jan - commit/janhq/jan?color=green)
- Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.   
- Gradio - app/gradio.svg?style=flat&color=green)  
- Open WebUI - friendly AI Interface (Supports Ollama, OpenAI API, ...).   
- PrivateGPT - ai/private-gpt.svg?style=flat&color=green)  
- Chat SDK - featured, hackable Next.js AI chatbot built by Vercel   
-
Database
- Faiss - commit/facebookresearch/faiss?color=green)
- weaviate - source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.   
- milvus - performance, cloud-native vector database built for scalable vector ANN search.   
- deeplake - time to PyTorch/TensorFlow.   
- chroma - native open-source embedding database.   
- chroma - native open-source embedding database.   
- deeplake - time to PyTorch/TensorFlow.   
-
Observation
- OpenLLMetry - source observability for your LLM application, based on OpenTelemetry.   
- OpenLIT - native LLM Observability, GPU Monitoring   
- phoenix - ai/phoenix.svg?style=flat&color=green)  
- Helicone - commit/helicone/helicone?color=green)
- wandb - tune models, and manage models from experimentation to production.   
- Langfuse - commit/langfuse/langfuse?color=green)
-
Code Assistant
- Auto-dev - powered coding wizard(AI 驱动编程助手)with multilingual support 🌐, auto code generation 🏗️, and a helpful bug-slaying assistant 🐞! Customizable prompts 🎨 and a magic Auto Dev/Testing/Document/Agent feature 🧪 included! 🚀.   
- Codefuse-chatbot - Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.   
- Cody - commit/sourcegraph/cody?color=green)
- Continue - source IDE extensions and hub of models, rules, prompts, docs, and other building blocks.   
- Sweep - commit/sweepai/sweep?color=green)
- Tabby - hosted AI coding assistant.   
-
Development Environment
- E2B - dev/E2B.svg?style=flat&color=green)  
- Daytona - Generated Code.   
-
-
Training
-
Workflow
- Flyte - commit/flyteorg/flyte?color=green)
- BentoML - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!   
- Kubeflow - commit/kubeflow/kubeflow?color=green)
- MLflow - commit/mlflow/mlflow?color=green)
- ZenML - io/zenml.svg?style=flat&color=green)  
- Ray - project/ray.svg?style=flat&color=green)  
- Metaflow - commit/netflix/metaflow?color=green)
- Polyaxon - commit/polyaxon/polyaxon?color=green)
- Seldon-Core - core.svg?style=flat&color=green)  
- BentoML - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!   
- Kubeflow - commit/kubeflow/kubeflow?color=green)
- Metaflow - commit/netflix/metaflow?color=green)
- MLflow - commit/mlflow/mlflow?color=green)
- Seldon-Core - core.svg?style=flat&color=green)  
- ZenML - io/zenml.svg?style=flat&color=green)  
-
Framework
- MaxText - commit/google/maxtext?color=green)
- ColossalAI - commit/hpcaitech/ColossalAI?color=green)
- Ludwig - code framework for building custom LLMs, neural networks, and other AI models.   
- MLX - explore/mlx.svg?style=flat&color=green)  
- AXLearn - commit/apple/axlearn?color=green)
- Candle - commit/huggingface/candle?color=green)
- DLRover - machine-learning/dlrover.svg?style=flat&color=green)  
-
FineTune
- torchtune - training library.   
- unsloth - R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥   
- Axolotl - ai-cloud/axolotl.svg?style=flat&color=green)  
- LLaMa-Factory - Tuning of 100+ LLMs & VLMs (ACL 2024).   
- Swift - parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).   
- maestro - tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.   
- EasyLM - training, finetuning, evaluating and serving LLMs in JAX/Flax.   
- LLaMa-Factory - Tuning of 100+ LLMs & VLMs (ACL 2024).   
- LMFlow - commit/optimalscale/lmflow?color=green)
- MLX-VLM - VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.   
- Swift - parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).   
- Transformer Lab - tune, and evaluate large language models on your own computer.   
-
Evaluation
- AgentBench - commit/thudm/agentbench?color=green)
- LongBench - commit/thudm/longbench?color=green)
- lm-evaluation-harness - shot evaluation of language models.   
- LiveBench - Free LLM Benchmark   
- lm-evaluation-harness - shot evaluation of language models.   
- OpenCompass - 4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.   
- opik - ready dashboards.   
-
- ColossalAI
- Ludwig - ai/ludwig.svg) |  |  | Low-code framework for building custom LLMs, neural networks, and other AI models | |
-
Alignment
- OpenRLHF - to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).   
- Self-RLHF - alignment/safe-rlhf.svg?style=flat&color=green)  
-
Inference Platform
- MLX - explore/mlx.svg?style=flat&color=green)  
-
-
LLMOps
- Dify
- FastChat - sys/fastchat.svg) |  |  | An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. | |
- Flowise
- LiteLLM - Azure, OpenAI, Cohere, Anthropic, Replicate. Manages input/output translation | |
- Mem0
- LlamaIndex - llama/llama_index.svg) </br>  </br>  | LlamaIndex is a data framework for your LLM applications |
-
MLOps
-
Agent
-
Framework
- MetaGPT - Agent Framework: First AI Software Company, Towards Natural Language Programming.   
- PydanticAI - ai.svg?style=flat&color=green)  
-
FineTune
- torchtune - PyTorch Library for LLM Fine-tuning | |
- unsloth - 5x faster with 80% less memory | |
- Axolotl - ai-cloud/axolotl.svg) </br>  </br>  | Go ahead and axolotl questions |
-
Evaluation
-
DB Store
-
Observation
-
- Helicone AI - source LangSmith alternative for logging, monitoring, and debugging AI applications.|
- OpenLLMetry - source observability for your LLM application, based on OpenTelemetry |
- Helicone AI - source LangSmith alternative for logging, monitoring, and debugging AI applications.|
-
Inference Platform
- phoenix - ai/phoenix.svg?style=flat&color=green)  
-
MCP Client
- wandb - tune models, and manage models from experimentation to production.   
-
-
GPU
-
Scheduling
- Project-HAMi - HAMi/HAMi.svg?style=flat&color=green)  
- KAI Scheduler - Scheduler.svg?style=flat&color=green)  
-
Management
- NVIDIA GPU Operator - operator.svg?style=flat&color=green)  
-
-
Alignment
-
Tools
- OpenRLHF - to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).   
- Self-RLHF - alignment/safe-rlhf.svg?style=flat&color=green)  
-
-
Application Orchestration Framework
-
Tools
- Haystack - ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.   
- LangChain - aware reasoning applications.   
-
-
Database
-
Tools
- weaviate - source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.   
-
-
MCP
-
MCP Client
- awesome-mcp-clients - mcp-clients.svg?style=flat&color=green)  
-
MCP Server
- awesome-mcp-servers - mcp-servers.svg?style=flat&color=green)  
- mcp-directory - directory.svg?style=flat&color=green)  
- Cline MCP Marketplace
- BaiLian MCP
- Docker MCP Catalog - quality MCP servers as Docker Images, spanning database solutions, developer tools, productivity platforms, and API integrations.
- Higress MCP Marketplace
- MCPMarket
- ModelScope MCP
-
Programming Languages
Categories
Sub Categories
Inference Engine
26
Agent
23
Workflow
20
Chatbot
15
FineTune
12
Inference Platform
10
Framework
9
Tools
9
MCP Server
8
Evaluation
7
Database
7
Code Assistant
6
Observation
6
AI Gateway
6
LLM Router
3
RAG
3
Output
2
Scheduling
2
Alignment
2
Benchmark
2
Development Environment
2
MCP Client
2
Management
1
Keywords
llm
83
ai
48
machine-learning
33
python
31
deep-learning
28
openai
27
llama
26
llmops
26
pytorch
25
llms
23
mlops
22
gpt
22
rag
21
large-language-models
19
inference
18
chatgpt
18
kubernetes
17
data-science
16
langchain
16
llama3
15
ml
15
agents
15
fine-tuning
14
llm-serving
13
agent
13
transformers
13
tensorflow
12
llama2
12
language-model
12
deepseek
11
mistral
11
llm-inference
11
vector-database
11
transformer
11
gpt-4
11
artificial-intelligence
10
generative-ai
9
prompt-engineering
9
qwen
9
deepseek-r1
9
open-source
9
model-serving
8
ollama
8
lora
8
llm-evaluation
7
computer-vision
7
deployment
7
cuda
7
reinforcement-learning
7
chatbot
7