awesome-mlops
:sunglasses: A curated list of awesome MLOps tools
https://github.com/kelvins/awesome-mlops
Last synced: 7 days ago
JSON representation
-
Cron Job Monitoring
- Heartbeat.pm - Monitoring aliveness of any sensor/cron job.
- Cronitor - Monitor any cron job or scheduled task.
- HealthchecksIO - Simple and effective cron job monitoring.
-
Workflow Tools
- Velda - Run jobs and workflows as if on your local machine.
- VDP - An open-source tool to seamlessly integrate AI for unstructured data into the modern data stack.
- Metaflow - Human-friendly lib that helps scientists and engineers build and manage data science projects.
- Flyte - Easy to create concurrent, scalable, and maintainable workflows for machine learning.
- Argo - Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
- Automate Studio - Rapidly build & deploy AI-powered workflows.
- Ploomber - Write maintainable, production-ready pipelines. Develop locally, deploy to the cloud.
- Kale - Aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.
- Kedro - Library that implements software engineering best-practice for data and ML pipelines.
- Luigi - Python module that helps you build complex pipelines of batch jobs.
- Couler - Unified interface for constructing and managing workflows on different workflow engines.
- dstack - An open-core tool to automate data and training workflows.
- Hamilton - A scalable general purpose micro-framework for defining dataflows.
- ZenML - An extensible open-source MLOps framework to create reproducible pipelines.
- MLRun - Generic mechanism for data scientists to build, run, and monitor ML tasks and pipelines.
-
Articles
- Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning
- What Is MLOps?
- A Tour of End-to-End Machine Learning Platforms
- MLOps: Continuous delivery and automation pipelines in machine learning
- Continuous Delivery for Machine Learning
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Continuous delivery and automation pipelines in machine learning
- Rules of Machine Learning: Best Practices for ML Engineering
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- MLOps: Machine Learning as an Engineering Discipline
- What Is MLOps?
- Machine Learning Operations (MLOps): Overview, Definition, and Architecture
- MLOps Roadmap: A Complete MLOps Career Guide
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
-
Books
- AI Governance
- Building Machine Learning Pipelines
- Building Machine Learning Powered Applications
- Designing Machine Learning Systems
- Engineering MLOps
- Implementing MLOps in the Enterprise
- Introducing MLOps
- Kubeflow for Machine Learning
- Kubeflow Operations Guide
- Machine Learning Design Patterns
- ML Ops: Operationalizing Data Science
- Practical Deep Learning at Scale with MLflow
- Practical MLOps
- Production-Ready Applied Deep Learning
- Reliable Machine Learning
- The Machine Learning Solutions Architect Handbook
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- Implementing MLOps in the Enterprise
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- Kubeflow Operations Guide
- Practical MLOps
- Reliable Machine Learning
- MLOps Lifecycle Toolkit
- Building Machine Learning Pipelines
- Building Machine Learning Powered Applications
- Designing Machine Learning Systems
- ML Ops: Operationalizing Data Science
- MLOps Engineering at Scale
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- MLOps Lifecycle Toolkit
- AI Model Evaluation
- Beginning MLOps with MLFlow
- Deep Learning in Production
- Introducing MLOps
- Kubeflow for Machine Learning
- Machine Learning Design Patterns
- Machine Learning Engineering in Action
-
Data Exploration
- Jupyter Notebook - Web-based notebook environment for interactive computing.
- Jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.
- Deepnote - Drop-in replacement for Jupyter and an AI-native workspace for modern data teams.
- BambooLib - An intuitive GUI for Pandas DataFrames.
- DataPrep - Collect, clean and visualize your data in Python.
- Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects.
- Polynote - The polyglot notebook with first-class Scala support.
- Apache Zeppelin - Enables data-driven, interactive data analytics and collaborative documents.
-
Data Management
- Hub - A dataset format for creating, storing, and collaborating on AI datasets of any size.
- Intake - A lightweight set of tools for loading and sharing data in data science projects.
- Milvus - An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy.
- lakeFS - Repeatable, atomic and versioned data lake on top of object storage.
- Marquez - Collect, aggregate, and visualize a data ecosystem's metadata.
- Delta Lake - Storage layer that brings scalable, ACID transactions to Apache Spark and other engines.
- Dolt - SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.
- BlazingSQL - A lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
- Arrikto - Dead simple, ultra fast storage for the hybrid Kubernetes world.
- Dud - A lightweight CLI tool for versioning data alongside source code and building data pipelines.
- DVC - Management and versioning of datasets and machine learning models.
- Qdrant - An open source vector similarity search engine with extended filtering support.
- Quilt - A self-organizing data hub with S3 support.
-
Data Processing
- Azkaban - Batch workflow job scheduler created at LinkedIn to run Hadoop jobs.
- Hadoop - Framework that allows for the distributed processing of large data sets across clusters.
- Spark - Unified analytics engine for large-scale data processing.
- Dagster - A data orchestrator for machine learning, analytics, and ETL.
- Airflow - Platform to programmatically author, schedule, and monitor workflows.
- OpenRefine - Power tool for working with messy data and improving it.
-
Data Validation
- Cerberus - Lightweight, extensible data validation library for Python.
- TFDV - An library for exploring and validating machine learning data.
- Cleanlab - Python library for data-centric AI and machine learning with messy, real-world data and labels.
- JSON Schema - A vocabulary that allows you to annotate and validate JSON documents.
-
Model Serving
- PredictionIO - Event collection, deployment of algorithms, evaluation, querying predictive results via APIs.
- Banana - Host your ML inference code on serverless GPUs and integrate it into your app with one line of code.
- Beam - Develop on serverless GPUs, deploy highly performant APIs, and rapidly prototype ML models.
- Quix - Serverless platform for processing data streams in real-time with machine learning models.
- TensorFlow Serving - Flexible, high-performance serving system for ML models, designed for production.
- TorchServe - A flexible and easy to use tool for serving PyTorch models.
- GraphPipe - Machine learning model deployment made simple.
- Merlin - A platform for deploying and serving machine learning models.
- Geniusrise - Host inference APIs, bulk inference and fine tune text, vision, audio and multi-modal models.
- Wallaroo.AI - A platform for deploying, serving, and optimizing ML models in both cloud and edge environments.
- Cog - Open-source tool that lets you package ML models in a standard, production-ready container.
- Cortex - Machine learning model serving infrastructure.
- BentoML - Open-source platform for high-performance ML model serving.
- BudgetML - Deploy a ML inference service on a budget in less than 10 lines of code.
- Hydrosphere - Platform for deploying your Machine Learning to production.
- LocalAI - Drop-in replacement REST API that’s compatible with OpenAI API specifications for inferencing.
- MLEM - Version and deploy your ML models following GitOps principles.
- Opyrator - Turns your ML code into microservices with web API, interactive GUI, and more.
- Rune - Provides containers to encapsulate and deploy EdgeML pipelines and applications.
- Seldon - Take your ML projects from POC to production with maximum efficiency and minimal risk.
- Streamlit - Lets you create apps for your ML projects with deceptively simple Python scripts.
- Vespa - Store, search, organize and make machine-learned inferences over big data at serving time.
- KFServing - Kubernetes custom resource definition for serving ML models on arbitrary frameworks.
- Gradio - Create customizable UI components around your models.
- Triton Inference Server - Provides an optimized cloud and edge inferencing solution.
-
Other Lists
-
AutoML
- H2O AutoML - Automates ML workflow, which includes automatic training and tuning of models.
- MLBox - MLBox is a powerful Automated Machine Learning python library.
- Model Search - Framework that implements AutoML algorithms for model architecture search at scale.
- AutoGluon - Automated machine learning for image, text, tabular, time-series, and multi-modal data.
- AutoPyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
- EvalML - A library that builds, optimizes, and evaluates ML pipelines using domain-specific functions.
- FLAML - Finds accurate ML models automatically, efficiently and economically.
- AutoKeras - AutoKeras goal is to make machine learning accessible for everyone.
- AutoSKLearn - Automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
- MindsDB - AI layer for databases that allows you to effortlessly develop, train and deploy ML models.
- NNI - An open source AutoML toolkit for automate machine learning lifecycle.
-
Data Catalog
- Apache Atlas - Provides open metadata management and governance capabilities to build a data catalog.
- OpenMetadata - A Single place to discover, collaborate and get your data right.
- DataHub - LinkedIn's generalized metadata search & discovery tool.
- Amundsen - Data discovery and metadata engine for improving the productivity when interacting with data.
- CKAN - Open-source DMS (data management system) for powering data hubs and data portals.
- Magda - A federated, open-source data catalog for all your big data and small data.
- Metacat - Unified metadata exploration API service for Hive, RDS, Teradata, Redshift, S3 and Cassandra.
-
Events
-
Podcasts
-
Slack
-
Websites
-
Data Visualization
- Data Studio - Reporting solution for power users who want to go beyond the data and dashboards of GA.
- Metabase - The simplest, fastest way to get business intelligence and analytics to everyone.
- Redash - Connect to any data source, easily visualize, dashboard and share your data.
- SolidUI - AI-generated visualization prototyping and editing platform, support 2D and 3D models.
- Dash - Analytical Web Apps for Python, R, Julia, and Jupyter.
- Facets - Visualizations for understanding and analyzing machine learning datasets.
- Grafana - Multi-platform open source analytics and interactive visualization web application.
- Lux - Fast and easy data exploration by automating the visualization and data analysis process.
-
Feature Store
- Feast - End-to-end open source feature store for machine learning.
- Feathr - An enterprise-grade, high performance feature store.
- Tecton - A fully-managed feature platform built to orchestrate the complete lifecycle of features.
- Butterfree - A tool for building feature stores. Transform your raw data into beautiful features.
- Featureform - A Virtual Feature Store. Turn your existing data infrastructure into a feature store.
- ByteHub - An easy-to-use feature store. Optimized for time-series data.
-
Hyperparameter Tuning
- Tune - Python library for experiment execution and hyperparameter tuning at any scale.
- Scikit Optimize - Simple and efficient library to minimize expensive and noisy black-box functions.
- Advisor - Open-source implementation of Google Vizier for hyper parameters tuning.
- Hyperas - A very simple wrapper for convenient hyperparameter optimization.
- Hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
- Katib - Kubernetes-based system for hyperparameter tuning and neural architecture search.
- KerasTuner - Easy-to-use, scalable hyperparameter optimization framework.
- Optuna - Open source hyperparameter optimization framework to automate hyperparameter search.
- Talos - Hyperparameter Optimization for TensorFlow, Keras and PyTorch.
-
Machine Learning Platform
- aiWARE - aiWARE helps MLOps teams evaluate, deploy, integrate, scale & monitor ML models.
- Algorithmia - Securely govern your machine learning operations with a healthy ML lifecycle.
- Allegro AI - Transform ML/DL research into products. Faster.
- Iguazio - Data science platform that automates MLOps with end-to-end machine learning pipelines.
- Katonic - Automate your cycle of intelligence with Katonic MLOps Platform.
- Kubeflow - Making deployments of ML workflows on Kubernetes simple, portable and scalable.
- Omnimizer - Simplifies and accelerates MLOps by bridging the gap between ML models and edge hardware.
- Sematic - An open-source end-to-end pipelining tool to go from laptop prototype to cloud in no time.
- SigOpt - A platform that makes it easy to track runs, visualize training, and scale hyperparameter tuning.
- TrueFoundry - A Cloud-native MLOps Platform over Kubernetes to simplify training and serving of ML Models.
- Sagemaker - Fully managed service that provides the ability to build, train, and deploy ML models quickly.
- Bodywork - Deploys machine learning projects developed in Python, to Kubernetes.
- Neu.ro - MLOps platform that integrates open-source and proprietary tools into client-oriented systems.
- SigOpt - A platform that makes it easy to track runs, visualize training, and scale hyperparameter tuning.
- Omnimizer - Simplifies and accelerates MLOps by bridging the gap between ML models and edge hardware.
- CNVRG - An end-to-end machine learning platform to build and deploy AI models at scale.
- DataRobot - AI platform that democratizes data science and automates the end-to-end ML at scale.
- Edge Impulse - Platform for creating, optimizing, and deploying AI/ML algorithms for edge devices.
- envd - Machine learning development environment for data science and AI/ML engineering teams.
- FedML - Simplifies the workflow of federated learning anywhere at any scale.
- Gradient - Multicloud CI/CD and MLOps platform for machine learning teams.
- Hopsworks - Open-source platform for developing and operating machine learning models at scale.
- Knime - Create and productionize data science using one easy and intuitive environment.
- LynxKite - A complete graph data science platform for very large graphs and other datasets.
- ML Workspace - All-in-one web-based IDE specialized for machine learning and data science.
- MLReef - Open source MLOps platform that helps you collaborate, reproduce and share your ML work.
- Omnimizer - Simplifies and accelerates MLOps by bridging the gap between ML models and edge hardware.
- SAS Viya - Cloud native AI, analytic and data management platform that supports the analytics life cycle.
- Valohai - Takes you from POC to production while managing the whole model lifecycle.
- DAGsHub - A platform built on open source tools for data, model and pipeline management.
- Dataiku - Platform democratizing access to data and enabling enterprises to build their own path to AI.
- Modzy - Deploy, connect, run, and monitor machine learning (ML) models in the enterprise and at the edge.
- Pachyderm - Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise.
-
Model Lifecycle
- Comet - Track your datasets, code changes, experimentation history, and models.
- Aeromancy - A framework for performing reproducible AI and ML for Weights and Biases.
- Keepsake - Version control for machine learning with support to Amazon S3 and Google Cloud Storage.
- MLflow - Open source platform for the machine learning lifecycle.
- Comet - Track your datasets, code changes, experimentation history, and models.
- Aim - A super-easy way to record, search and compare 1000s of ML training runs.
- Cascade - Library of ML-Engineering tools for rapid prototyping and experiment management.
- Guild AI - Open source experiment tracking, pipeline automation, and hyperparameter tuning.
- Sacred - A tool to help you configure, organize, log and reproduce experiments.
- Weights and Biases - A tool for visualizing and tracking your machine learning experiments.
- ModelDB - Open source ML model versioning, metadata, and experiment management.
- Neptune AI - The most lightweight experiment management tool that fits any workflow.
-
Optimization Tools
- Dask - Provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
- Singa - Apache top level project, focusing on distributed training of DL and ML models.
- Mahout - Distributed linear algebra framework and mathematically expressive Scala DSL.
- Modin - Speed up your Pandas workflows by changing a single line of code.
- Nos - Open-source module for running AI workloads on Kubernetes in an optimized way.
- Petastorm - Enables single machine or distributed training and evaluation of deep learning models.
- Rapids - Gives the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
- Tpot - Automated ML tool that optimizes machine learning pipelines using genetic programming.
- Nebullvm - Easy-to-use library to boost AI inference.
- Ray - Fast and simple framework for building and running distributed applications.
- Fiber - Python distributed computing library for modern computer clusters.
- Accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
- MLlib - Apache Spark's scalable machine learning library.
- DeepSpeed - Deep learning optimization library that makes distributed training easy, efficient, and effective.
- Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
-
Simplification Tools
- Chassis - Turns models into ML-friendly containers that run just about anywhere.
- Ludwig - Allows users to train and test deep learning models without the need to write code.
- MLNotify - No need to keep checking your training, just one import line and you'll know the second it's done.
- Sagify - A CLI utility to train and deploy ML/DL models on AWS SageMaker.
- Soorgeon - Convert monolithic Jupyter notebooks into maintainable pipelines.
- Soopervisor - Export ML projects to Kubernetes (Argo workflows), Airflow, AWS Batch, and SLURM.
- Hermione - Help Data Scientists on setting up more organized codes, in a quicker and simpler way.
- Hydra - A framework for elegantly configuring complex applications.
- Koalas - Pandas API on Apache Spark. Makes data scientists more productive when interacting with big data.
- PyCaret - Open source, low-code machine learning library in Python.
- TrainGenerator - A web app to generate template code for machine learning.
- Turi Create - Simplifies the development of custom machine learning models.
-
Visual Analysis and Debugging
- Superwise - Fully automated, enterprise-grade model observability in a self-service SaaS platform.
- Manifold - A model-agnostic visual debugging tool for machine learning.
- Netron - Visualizer for neural network, deep learning, and machine learning models.
- Aporia - Observability with customized monitoring and explainability for ML models.
- Radicalbit - The open source solution for monitoring your AI models in production.
- Opik - Evaluate, test, and ship LLM applications with a suite of observability tools.
- Yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.
- NannyML - Algorithm capable of fully capturing the impact of data drift on performance.
- Whylogs - The open source standard for data logging. Enables ML monitoring and observability.
- Evidently - Interactive reports to analyze ML models during validation or production monitoring.
- Fiddler - Monitor, explain, and analyze your AI in production.
-
Drift Detection
- ml3-drift - Drift detection algorithms seamlessly integrated with ML and AI frameworks.
- Frouros - An open source Python library for drift detection in machine learning systems.
- Alibi Detect - An open source Python library focused on outlier, adversarial and drift detection.
- Frouros - An open source Python library for drift detection in machine learning systems.
- TorchDrift - A data and concept drift library for PyTorch.
-
CI/CD for Machine Learning
-
Model Fairness and Privacy
- Fairlearn - A Python package to assess and improve fairness of machine learning models.
- AIF360 - A comprehensive set of fairness metrics for datasets and machine learning models.
- Opacus - A library that enables training PyTorch models with differential privacy.
- TensorFlow Privacy - Library for training machine learning models with privacy for training data.
-
Data Enrichment
-
Feature Engineering
- Feature Engine - Feature engineering package with SKlearn like functionality.
- Featuretools - Python library for automated feature engineering.
- TSFresh - Python library for automatic extraction of relevant features from time series.
-
Knowledge Sharing
- Knowledge Repo - Knowledge sharing platform for data scientists and other technical professions.
-
Model Interpretability
- InterpretML - A toolkit to help understand models and enable responsible machine learning.
- LIME - Explaining the predictions of any machine learning classifier.
- Lucid - Collection of infrastructure and tools for research in neural network interpretability.
- SAGE - For calculating global feature importance using Shapley values.
- SHAP - A game theoretic approach to explain the output of any machine learning model.
- Alibi - Open-source Python library enabling ML model inspection and interpretation.
- Captum - Model interpretability and understanding library for PyTorch.
- ELI5 - Python package which helps to debug machine learning classifiers and explain their predictions.
-
Model Testing & Validation
- Deepchecks - Open-source package for validating ML models & data, with various checks and suites.
- Starwhale - An MLOps/LLMOps platform for model building, evaluation, and fine-tuning.
Programming Languages
Categories
Books
77
Articles
41
Machine Learning Platform
33
Model Serving
25
Optimization Tools
15
Workflow Tools
15
Data Management
13
Model Lifecycle
12
Other Lists
12
Simplification Tools
12
Visual Analysis and Debugging
11
AutoML
11
Hyperparameter Tuning
9
Websites
9
Data Visualization
8
Podcasts
8
Model Interpretability
8
Data Exploration
8
Events
7
Data Catalog
7
Data Processing
6
Feature Store
6
Drift Detection
5
Data Validation
4
Model Fairness and Privacy
4
CI/CD for Machine Learning
3
Feature Engineering
3
Cron Job Monitoring
3
Slack
2
Model Testing & Validation
2
Data Enrichment
2
Knowledge Sharing
1
Sub Categories
Keywords
machine-learning
79
python
49
data-science
49
deep-learning
35
mlops
27
pytorch
20
tensorflow
18
automl
15
ai
15
scikit-learn
14
data-engineering
12
kubernetes
12
hyperparameter-optimization
11
ml
10
artificial-intelligence
9
jupyter-notebook
9
llm
9
feature-engineering
8
automated-machine-learning
8
data-visualization
8
keras
8
visualization
7
data-quality
7
data-analysis
7
analytics
7
optimization
7
gpu
7
interpretability
6
jupyter
6
distributed
5
pandas
5
serving
5
workflow
5
model-serving
5
docker
5
llms
5
data-drift
5
llmops
5
hyperparameter-tuning
5
awesome
5
machinelearning
5
deeplearning
5
awesome-list
5
neural-network
5
tabular-data
4
natural-language-processing
4
spark
4
dataset
4
data-validation
4
model-selection
4