awesome-llmops
An awesome & curated list of best LLMOps tools for developers
https://github.com/tensorchord/awesome-llmops
Last synced: 3 days ago
JSON representation
-
Serving
-
Large Model Serving
- DeepSpeed-MII - latency and high-throughput inference possible, powered by DeepSpeed. |  |
- CTranslate2 - square) |
- Clip-as-a-service - ai/clip-as-service.svg?style=flat-square) |
- Flowise - square) |
- llama.cpp - square) |
- Infinity - embeddings |  |
- Modelz-LLM - llm.svg?style=flat-square) |
- TensorRT-LLM - LLM.svg?style=flat-square) |
- text-generation-inference - generation-inference.svg?style=flat-square) |
- vllm - throughput and memory-efficient inference and serving engine for LLMs. |  |
- whisper.cpp - square) |
- x-stable-diffusion - time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. |  |
- Alpaca-LoRA-Serve - LoRA as Chatbot service |  |
- tokenizers - of-the-Art Tokenizers optimized for Research and Production |  |
- FlexGen - oriented scenarios. |  |
- text-embeddings-inference - embedding models |  |
- Faster Whisper - whisper.svg?style=flat-square) |
- Ollama - square) |
-
Frameworks/Servers for Serving
- BentoML - square) |
- TFServing - performance serving system for machine learning models. |  |
- Torchserve - square) |
- Triton Server (TRTIS) - inference-server/server.svg?style=flat-square) |
- langchain-serve - ai/langchain-serve.svg?style=flat-square) |
- Xinference - source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. |  |
- Mosec - to-use Python interface. |  |
- lanarky - grade LLM applications |  |
- ray-llm - RayLLM |  |
- Jina - ai/jina.svg?style=flat-square) |
- Open Responses - source platform for building long-running LLM agents with tool use. |  |
- KubeAI - to-text. |  |
- Kaito - 3) using container images and GPU auto-provisioning. Includes an OpenAI-compatible server for inference and preset configurations for popular runtimes such as vLLM and transformers. |  |
-
-
LLMOps
-
Observability
- langchain - square) |
- TrueFoundry - prem) Infra including deploying, Fine-tuning, tracking Prompts and serving Open Source LLM Models with full Data Security and Optimal GPU Management. Train and Launch your LLM Application at Production scale with best Software Engineering practices. | |
- agenta - AI/agenta.svg?style=flat-square) |
- AI studio - square) |
- Arize-Phoenix - ai/phoenix.svg?style=flat-square) |
- BudgetML - square) |
- deeplake - square) |
- Dstack - effective LLM development in any cloud (AWS, GCP, Azure, Lambda, etc). |  |
- GPTCache - square) |
- Haystack - answering and more. |  |
- Langfuse - square) |
- LLMApp - time LLM-enabled data pipelines with few lines of code. |  |
- LLMFlows - answering systems, and agents. |  |
- OpenLIT - native GenAI and LLM Application Observability tool and provides OpenTelmetry Auto-instrumentation for monitoring LLMs, VectorDBs and Frameworks. It provides valuable insights into token & cost usage, user interaction, and performance related metrics. |  |
- Pezzo 🕹️ - source LLMOps platform built for developers and teams. In just two lines of code, you can seamlessly troubleshoot your AI operations, collaborate and manage your prompts in one place, and instantly deploy changes to any environment. |  |
- xTuring - tuning. |  |
- LiteLLM 🚅 - square) |
- AI studio - square) |
- Parea AI - controlled enhanced prompt playground. |  |
- Opik - ml/opik.svg?style=flat-square) |
- PromptFoundry - foundry/python-sdk.svg?style=flat-square) |
- Izlo
- Fiddler AI - production to production. | |
- Vellum
- PromptLayer 🍰 - layer-library.svg?style=flat-square) |
- MLflow - source framework for the end-to-end machine learning lifecycle, helping developers track experiments, evaluate models/prompts, deploy models, and add observability with tracing. |  |
- Manag.ai - in-one prompt management and observability platform. Craft, track, and perfect your LLM prompts with ease. | |
- gotoHuman - based and agentic workflows. Prompt users to approve actions, select next steps, or review and validate generated results. |
- GPUStack - source GPU cluster manager for running and managing LLMs |  |
- Helicone - source LLM observability platform for logging, monitoring, and debugging AI applications. Simple 1-line integration to get started. |  |
- PromptSite - works directly with your local filesystem, ideal for data scientists and engineers to easily integrate into existing LLM workflows | |
- Keywords AI
- Literal AI - modal LLM observability and evaluation platform. Create prompt templates, deploy prompts versions, debug LLM runs, create datasets, run evaluations, monitor LLM metrics and collect human feedback. | |
- Dify - source framework aims to enable developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable. |  |
- Dataoorts
- Glide - Native LLM Routing Engine. Improve LLM app resilience and speed. |  |
- Laminar - source all-in-one platform for engineering AI products. Traces, Evals, Datasets, Labels. |  |
- LangKit - of-the-box LLM telemetry collection library that extracts features and profiles prompts, responses and metadata about how your LLM is performing over time to find problems at scale. |  |
- magentic - powered functionality. |  |
- Mirascope - fast, efficient development and ensuring quality in LLM-based applications |  |
- PromptDX - ai/promptdx.svg?style=flat-square) |
- prompttools - source tools for testing and experimenting with prompts. The core idea is to enable developers to evaluate prompts using familiar interfaces like code and notebooks. In just a few lines of codes, you can test your prompts and parameters across different models (whether you are using OpenAI, Anthropic, or LLaMA models). You can even evaluate the retrieval accuracy of vector databases. |  |
- systemprompt.io
- Portkey - efficient apps. | |
- Weights & Biases (Prompts) - first W&B MLOps platform. Utilize W&B Prompts for visualizing and inspecting LLM execution flow, tracking inputs and outputs, viewing intermediate results, securely managing prompts and LLM chain configurations. | |
- PromptMage - source tool to simplify the process of creating and managing LLM workflows and prompts as a self-hosted solution. |  |
- Evidently - source framework to evaluate, test and monitor ML and LLM-powered systems. |  |
- Epsilla - in-one platform to create vertical AI agents powered by your private data and knowledge. | |
- promptfoo - source tool for testing & evaluating prompt quality. Create test cases, automatically check output quality and catch regressions, and reduce evaluation cost. |  |
- ReliableGPT 💪 - square) |
- AgentMark - Safe Markdown-based Agents |  |
- Cheshire Cat AI - cat-ai/core.svg?style=flat-square) |
- Lunary - and-play integration into LangChain. |  |
- LlamaIndex - square) |
-
-
AutoML
-
Profiling
- AutoGluon - square) |
- Archai - square) |
- autokeras - team/autokeras.svg?style=flat-square) |
- Auto-PyTorch - PyTorch.svg?style=flat-square) |
- auto-sklearn - in replacement for a scikit-learn estimator. |  |
- Dragonfly - square) |
- Determined - ai/determined.svg?style=flat-square) |
- DEvol (DeepEvolution) - square) |
- EvalML - square) |
- FEDOT - itmo/FEDOT.svg?style=flat-square) |
- FLAML - us/research/publication/flaml-a-fast-and-lightweight-automl-library/)). |  |
- Goptuna - bata/goptuna.svg?style=flat-square) |
- HpBandSter - square) |
- Hyperband - square) |
- Hypernets - square) |
- Hyperopt - square) |
- hyperunity - box hyperparameter optimisation. |  |
- Intelli - square) |
- Keras Tuner - team/keras-tuner.svg?style=flat-square) |
- learn2learn - learning Framework for Researchers. |  |
- MOE - square) |
- Model Search - square) |
- NNI - parameter tuning. |  |
- Optuna - square) |
- Pycaret - source, low-code machine learning library in Python that automates machine learning workflows. |  |
- REMBO - dimensions via random embedding. |  |
- RoBO - square) |
- Spearmint - square) |
- Torchmeta - Learning library for PyTorch. |  |
- Vegas - noah/vega.svg?style=flat-square) |
- TPOT - source software packages. |  |
- autoai - square) |
- AutoGL - square) |
- automl-gs - gs.svg?style=flat-square) |
- Katib - native project for automated machine learning (AutoML). |  |
- NASGym - of-concept OpenAI Gym environment for Neural Architecture Search (NAS). |  |
- scikit-optimize(skopt) - based optimization with a `scipy.optimize` interface. |  |
- AutoRAG - Boost your LLM app performance with your own data |  |
- FEDOT - itmo/FEDOT.svg?style=flat-square) |
- HPOlib2 - square) |
-
-
Observability
- PromptHub - Full stack prompt management tool designed to be usable by technical and non-technical team members. Test, version, collaborate, deploy, and monitor, all from one place.
- Prompteams - Prompt management system. Version, test, collaborate, and retrieve prompts through real-time APIs. Have GitHub style with repos, branches, and commits (and commit history).
- Doku - An open-source LLM Observability platform streamlining the monitoring of LLM applications with just two lines of code. It provides valuable insights into token usage and user engagement, tracks API usage for providers like OpenAI, and facilitates easy data export to observability platforms like Grafana and DataDog.
-
ML Platforms
- TrueFoundry - A PaaS to deploy, Fine-tune and serve LLM Models on a company’s own Infrastructure with Data Security and Optimal GPU and Cost Management. Launch your LLM Application at Production scale with best DevSecOps practices.
-
Large Scale Deployment
-
Workflow
- Airflow - square) |
- aqueduct - Source Platform for Production Data Science |  |
- Argo Workflows - workflows.svg?style=flat-square) |
- Flyte - native workflow automation platform for complex, mission-critical data and ML processes at scale. |  |
- Hamilton - inc/hamilton.svg?style=flat-square) |
- Kubeflow Pipelines - square) |
- Metaflow - life data science projects with ease! |  |
- Ploomber - square) |
- Prefect - square) |
- ZenML - io/zenml.svg?style=flat-square) |
- VDP - source unstructured data ETL tool to streamline the end-to-end unstructured data processing pipeline. |  |
- LangFlow - and-drop components and a chat interface. |  |
- aqueduct - Source Platform for Production Data Science |  |
-
ML Platforms
- OpenLLM - tune, serve, deploy, and monitor any LLMs with ease. |  |
- MLflow - square) |
- ModelFox - square) |
- Kserve - square) |
- Kubeflow - square) |
- Polyaxon - square) |
- Primehub - square) |
- Seldon-core - core.svg?style=flat-square) |
- Starwhale - tuning. |  |
- Weights & Biases - powered applications, featuring W&B Prompts for LLM execution flow visualization, input and output monitoring, and secure management of prompts and LLM chain configurations. |  |
- ClearML - Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management. |  |
- MLRun - square) |
- Hopsworks - tuning and serving LLMs. Hopsworks includes both a feature store and vector database for RAG. |  |
- OpenModelZ - click machine learning deployment (LLM, text-to-image and so on) at scale on any cluster (GCP, AWS, Lambda labs, your home lab, or even a single machine). |  |
-
Scheduling
- Kueue - native Job Queueing. |  |
- Slurm - square) |
- Volcano - sh/volcano.svg?style=flat-square) |
- PAI - sourced by Microsoft). |  |
- Yunikorn - weight, universal resource scheduler for container orchestrator systems. |  |
-
Model Management
-
-
Awesome Lists
-
Profiling
- Awesome Federated Learning Systems - paper.svg?style=flat-square) |
- Awesome AutoDL - depth analysis) |  |
- Awesome AutoML Papers - automl-papers.svg?style=flat-square) |
- Awesome Production Machine Learning - production-machine-learning.svg?style=flat-square) |
- Awesome AutoML - related research, tools, projects and other resources |  |
- Awesome-Code-LLM - LLM for research. |  |
- Awesome Federated Learning - organized from Arxiv (mostly) |  |
- awesome-federated-learning - federated-learning.svg?style=flat-square) |
- Awesome Open MLOps - open-mlops.svg?style=flat-square) |
- Awesome Tensor Compilers - tensor-compilers.svg?style=flat-square) |
- kelvins/awesome-mlops - mlops.svg?style=flat-square) |
- visenger/awesome-mlops - An awesome list of references for MLOps |  |
- currentslab/awesome-vector-search - vector-search.svg?style=flat-square) |
- pleisto/flappy - Ready LLM Agent SDK for Every Developer |  |
- Awesome Argo - argo.svg?style=flat-square) |
-
-
Model
-
Large Language Model
- Alpaca - lab/stanford_alpaca.svg?style=flat-square) |
- BELLE - tune by 34B Chinese Character Corpus, based on LLaMA and Alpaca. |  |
- dolly - square) |
- FastChat (Vicuna) - T5. |  |
- GLM-6B (ChatGLM) - Trained Model, quantization of ChatGLM-130B, can run on consumer-level GPUs. |  |
- ChatGLM2-6B - 6B is the second-generation version of the open-source bilingual (Chinese-English) chat model [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B). |  |
- GPT-NeoX - neox.svg?style=flat-square) |
- Luotuo - Alpaca-LoRA. |  |
- StableLM - AI/StableLM.svg?style=flat-square) |
- Falcon 40B - 40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license. | |
- Gemma
- Bloom - science Open-access Multilingual Language Model |  |
- GLM-130B (ChatGLM) - Trained Model (ICLR 2023) |  |
- Mixtral-8x7B-v0.1 - 8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. | |
-
CV Foundation Model
- disco-diffusion - diffusion.svg?style=flat-square) |
- segment-anything (SAM) - anything.svg?style=flat-square) |
- stable-diffusion - to-image diffusion model |  |
- stable-diffusion v2 - Resolution Image Synthesis with Latent Diffusion Models |  |
- midjourney
-
Audio Foundation Model
- bark - based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. |  |
- whisper - Scale Weak Supervision |  |
-
-
Security
-
Frameworks for LLM security
- Plexiglass - labs/plexiglass?style=flat-square) |
- Plexiglass - labs/plexiglass?style=flat-square) |
-
Observability
- Azure OpenAI Logger - openai-logger?style=flat-square) |
- Deepchecks - square) |
- Fiddler AI - production to production. Ship more ML and LLMs into production, and monitor ML and LLM metrics like hallucination, PII, and toxicity. |  |
- Giskard - AI/giskard.svg?style=flat-square) |
- Great Expectations - expectations/great_expectations.svg?style=flat-square) |
- whylogs - square) |
- Traceloop OpenLLMetry - based observability and monitoring for LLM and agents workflows. | 
-
-
Search
-
Vector search
- Awadb - ai/awadb.svg?style=flat-square) |
- Chroma - core/chroma.svg?style=flat-square) |
- Marqo - ai/marqo.svg?style=flat-square) |
- Milvus - io/milvus.svg?style=flat-square) |
- pgvecto.rs - square) |
- Qdrant - square) |
- txtai - powered semantic search applications |  |
- Vald - square) |
- Vearch - based vector retrieval |  |
- VectorDB - no more, no less. |  |
- Infinity - native database built for LLM applications, providing incredibly fast vector and full-text search |  |
- Lancedb - friendly, serverless vector database for AI applications. Easily add long-term memory to your LLM apps! |  |
- Pinecone - performance vector search applications. Developer-friendly, fully managed, and easily scalable without infrastructure hassles. | |
- pgvector - source vector similarity search for Postgres. |  |
- Vellum - of-box support for OCR, text chunking, embedding model experimentation, metadata filtering, and production-grade APIs. | |
- Weaviate - tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients. |  |
- AquilaDB - NN search. |  |
- Epsilla - cloud/vectordb.svg?style=flat-square) |
-
-
Code AI
-
Vector search
- CodeGeeX - square) |
- CodeGen - source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. |  |
- CodeT5 - square) |
- Continue - source autopilot for software development—bring the power of ChatGPT to VS Code |  |
- fauxpilot - source alternative to GitHub Copilot server |  |
- tabby - hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot. |  |
-
-
Training
-
IDEs and Workspaces
- code server - server.svg?style=flat-square) |
- conda - agnostic, system-level binary package manager and ecosystem. |  |
- Docker - source project created by Docker to enable and accelerate software containerization. |  |
- envd - square) |
- Jupyter Notebooks - based notebook environment for interactive computing. |  |
- Kurtosis - container environments. |  |
-
Foundation Model Fine Tuning
- alpaca-lora - tune LLaMA on consumer hardware |  |
- finetuning-scheduler - tuning schedules. |  |
- LMFlow - square) |
- TRL - square) |
- Flyflow - devs/flyflow.svg?style=flat-square) |
- Lora - rank adaptation to quickly fine-tune diffusion models. |  |
- peft - of-the-art Parameter-Efficient Fine-Tuning. |  |
- p-tuning-v2 - tuning on small/medium-sized models and sequence tagging challenges. [(ACL 2022)](https://arxiv.org/abs/2110.07602) |  |
- QLoRA - bit finetuning task performance. |  |
-
Frameworks for Training
- metric-learn - learn-contrib/metric-learn.svg?style=flat-square) |
- Oneflow - centered and open-source deep learning framework. |  |
- PaddlePaddle - square) |
- PyTorch - square) |
- XGBoost - square) |
- scikit-learn - learn/scikit-learn.svg?style=flat-square) |
- TensorFlow - square) |
- VectorFlow - square) |
- axolotl - tuning of various AI models, offering support for multiple configurations and architectures. |  |
- Candle - square`) |
- Accelerate - GPU, TPU, mixed-precision. |  |
- Apache MXNet - aware Dataflow Dep Scheduler. |  |
- Caffe - square) |
- ColossalAI - scale model training system with efficient parallelization techniques. |  |
- Horovod - square) |
- Kedro - source Python framework for creating reproducible, maintainable and modular data science code. |  |
- Keras - team/keras.svg?style=flat-square) |
- LightGBM - square) |
- MegEngine - to-use deep learning framework, with auto-differentiation. |  |
- MindSpore - ai/mindspore.svg?style=flat-square) |
- Jax - performance machine learning research. |  |
- DeepSpeed - square) |
-
Visualization
- OpenOps - square) |
- TensorSpace - trained deep learning models from TensorFlow, Keras, TensorFlow.js. |  |
- Fiddler AI
- Maniford - agnostic visual debugging tool for machine learning. |  |
- netron - square) |
- OpenOps - square) |
- TensorBoard - square) |
- dtreeviz - square) |
- Zetane Viewer - square) |
- Zeno - ml/zeno.svg?style=flat-square) |
-
Model Editing
- FastEdit - square) |
-
Experiment Tracking
- Aim - to-use and performant open-source experiment tracker. |  |
- Guild AI - square) |
- Kedro-Viz - Viz is an interactive development tool for building data science pipelines with Kedro. Kedro-Viz also allows users to view and compare different runs in the Kedro project. |  |
- LabNotebook - square) |
- Sacred - square) |
-
-
Data
-
Data Management
- Pachyderm - square) |
- ArtiVC - square) |
- Dolt - square) |
- Delta-Lake - io/delta.svg?style=flat-square) |
- Quilt - organizing data hub for S3. |  |
-
Data Storage
-
Data Tracking
-
Feature Engineering
- Featureform - square) |
- FeatureTools - square) |
-
Data/Feature enrichment
- Upgini - to-use features from public and community shared data sources and enriches your training dataset with only the accuracy improving features |  |
- Feast - dev/feast.svg?style=flat-square) |
- distilabel - quality outputs, full data ownership, and overall efficiency. |  |
-
-
Performance
-
ML Compiler
- ONNX-MLIR - mlir.svg?style=flat-square) |
- bitsandbytes - bit quantization for PyTorch. |  |
- TVM - square) |
-
Profiling
- octoml-profile - profile is a python library and cloud service designed to provide the simplest experience for assessing and optimizing the performance of PyTorch models on cloud hardware with state-of-the-art ML acceleration technology. |  |
- scalene - performance, high-precision CPU, GPU, and memory profiler for Python |  |
-
-
Optimizations
-
Profiling
- FeatherCNN - square) |
- Forward - square) |
- NCNN - performance neural network inference framework optimized for the mobile platform. |  |
- PocketFlow - square) |
- TensorFlow Model Optimization - optimization.svg?style=flat-square) |
- TNN - square) |
- optimum-tpu - tpu.svg?style=flat-square) |
- LangWatch - square) |
-
-
Federated ML
-
Profiling
- FATE - square) |
- FedML - scale cross-silo federated learning, cross-device federated learning on smartphones/IoTs, and research simulation. |  |
- Flower - square) |
- EasyFL - to-use Federated Learning Platform |  |
- Harmonia - source project aiming at developing systems/infrastructures and libraries to ease the adoption of federated learning (abbreviated to FL) for researches and production usage. |  |
- TensorFlow Federated - square) |
-
Programming Languages
Categories
Sub Categories
Profiling
71
Observability
61
Vector search
24
Frameworks for Training
22
Large Model Serving
18
ML Platforms
14
Large Language Model
14
Workflow
13
Frameworks/Servers for Serving
13
Visualization
10
Foundation Model Fine Tuning
9
IDEs and Workspaces
6
Model Management
5
Scheduling
5
Data Management
5
Experiment Tracking
5
CV Foundation Model
5
ML Compiler
3
Data/Feature enrichment
3
Data Storage
3
Feature Engineering
2
Audio Foundation Model
2
Frameworks for LLM security
2
Data Tracking
2
Model Editing
1
Keywords
machine-learning
107
python
62
deep-learning
59
llm
51
mlops
46
data-science
41
ai
40
pytorch
39
llmops
28
tensorflow
26
kubernetes
26
automl
24
ml
24
inference
19
openai
18
large-language-models
18
hyperparameter-optimization
17
llms
16
chatgpt
15
vector-database
15
vector-search
14
keras
14
gpt
14
prompt-engineering
13
langchain
13
gpu
13
rag
13
neural-architecture-search
12
docker
11
generative-ai
11
neural-network
11
llama
10
observability
10
nearest-neighbor-search
9
golang
9
developer-tools
9
transformers
9
artificial-intelligence
9
chatbot
9
data-engineering
9
scikit-learn
9
hyperparameter-tuning
9
model-serving
8
workflow
8
search-engine
8
analytics
8
fine-tuning
8
go
8
automated-machine-learning
8
language-model
8