Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

awesome-mlops

:sunglasses: A curated list of awesome MLOps tools
https://github.com/kelvins/awesome-mlops

  • AutoGluon - Automated machine learning for image, text, tabular, time-series, and multi-modal data.
  • AutoKeras - AutoKeras goal is to make machine learning accessible for everyone.
  • AutoPyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
  • AutoSKLearn - Automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
  • EvalML - A library that builds, optimizes, and evaluates ML pipelines using domain-specific functions.
  • FLAML - Finds accurate ML models automatically, efficiently and economically.
  • H2O AutoML - Automates ML workflow, which includes automatic training and tuning of models.
  • MindsDB - AI layer for databases that allows you to effortlessly develop, train and deploy ML models.
  • MLBox - MLBox is a powerful Automated Machine Learning python library.
  • Model Search - Framework that implements AutoML algorithms for model architecture search at scale.
  • NNI - An open source AutoML toolkit for automate machine learning lifecycle.
  • ClearML - Auto-Magical CI/CD to streamline your ML workflow.
  • CML - Open-source library for implementing CI/CD in machine learning projects.
  • Cronitor - Monitor any cron job or scheduled task.
  • HealthchecksIO - Simple and effective cron job monitoring.
  • Amundsen - Data discovery and metadata engine for improving the productivity when interacting with data.
  • Apache Atlas - Provides open metadata management and governance capabilities to build a data catalog.
  • CKAN - Open-source DMS (data management system) for powering data hubs and data portals.
  • DataHub - LinkedIn's generalized metadata search & discovery tool.
  • Magda - A federated, open-source data catalog for all your big data and small data.
  • Metacat - Unified metadata exploration API service for Hive, RDS, Teradata, Redshift, S3 and Cassandra.
  • OpenMetadata - A Single place to discover, collaborate and get your data right.
  • Snorkel - A system for quickly generating training data with weak supervision.
  • Upgini - Enriches training datasets with features from public and community shared data sources.
  • Apache Zeppelin - Enables data-driven, interactive data analytics and collaborative documents.
  • BambooLib - An intuitive GUI for Pandas DataFrames.
  • DataPrep - Collect, clean and visualize your data in Python.
  • Google Colab - Hosted Jupyter notebook service that requires no setup to use.
  • Jupyter Notebook - Web-based notebook environment for interactive computing.
  • JupyterLab - The next-generation user interface for Project Jupyter.
  • Jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts.
  • Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects.
  • Polynote - The polyglot notebook with first-class Scala support.
  • Arrikto - Dead simple, ultra fast storage for the hybrid Kubernetes world.
  • BlazingSQL - A lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
  • Delta Lake - Storage layer that brings scalable, ACID transactions to Apache Spark and other engines.
  • Dolt - SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.
  • Dud - A lightweight CLI tool for versioning data alongside source code and building data pipelines.
  • DVC - Management and versioning of datasets and machine learning models.
  • Git LFS - An open source Git extension for versioning large files.
  • Hub - A dataset format for creating, storing, and collaborating on AI datasets of any size.
  • Intake - A lightweight set of tools for loading and sharing data in data science projects.
  • lakeFS - Repeatable, atomic and versioned data lake on top of object storage.
  • Marquez - Collect, aggregate, and visualize a data ecosystem's metadata.
  • Milvus - An open source embedding vector similarity search engine powered by Faiss, NMSLIB and Annoy.
  • Pinecone - Managed and distributed vector similarity search used with a lightweight SDK.
  • Qdrant - An open source vector similarity search engine with extended filtering support.
  • Quilt - A self-organizing data hub with S3 support.
  • Airflow - Platform to programmatically author, schedule, and monitor workflows.
  • Azkaban - Batch workflow job scheduler created at LinkedIn to run Hadoop jobs.
  • Dagster - A data orchestrator for machine learning, analytics, and ETL.
  • Hadoop - Framework that allows for the distributed processing of large data sets across clusters.
  • OpenRefine - Power tool for working with messy data and improving it.
  • Spark - Unified analytics engine for large-scale data processing.
  • Cerberus - Lightweight, extensible data validation library for Python.
  • Cleanlab - Python library for data-centric AI and machine learning with messy, real-world data and labels.
  • Great Expectations - A Python data validation framework that allows to test your data against datasets.
  • JSON Schema - A vocabulary that allows you to annotate and validate JSON documents.
  • TFDV - An library for exploring and validating machine learning data.
  • Count - SQL/drag-and-drop querying and visualisation tool based on notebooks.
  • Dash - Analytical Web Apps for Python, R, Julia, and Jupyter.
  • Data Studio - Reporting solution for power users who want to go beyond the data and dashboards of GA.
  • Facets - Visualizations for understanding and analyzing machine learning datasets.
  • Grafana - Multi-platform open source analytics and interactive visualization web application.
  • Lux - Fast and easy data exploration by automating the visualization and data analysis process.
  • Metabase - The simplest, fastest way to get business intelligence and analytics to everyone.
  • Redash - Connect to any data source, easily visualize, dashboard and share your data.
  • SolidUI - AI-generated visualization prototyping and editing platform, support 2D and 3D models.
  • Superset - Modern, enterprise-ready business intelligence web application.
  • Tableau - Powerful and fastest growing data visualization tool used in the business intelligence industry.
  • Alibi Detect - An open source Python library focused on outlier, adversarial and drift detection.
  • Frouros - An open source Python library for drift detection in machine learning systems.
  • TorchDrift - A data and concept drift library for PyTorch.
  • Feature Engine - Feature engineering package with SKlearn like functionality.
  • Featuretools - Python library for automated feature engineering.
  • TSFresh - Python library for automatic extraction of relevant features from time series.
  • Butterfree - A tool for building feature stores. Transform your raw data into beautiful features.
  • ByteHub - An easy-to-use feature store. Optimized for time-series data.
  • Feast - End-to-end open source feature store for machine learning.
  • Feathr - An enterprise-grade, high performance feature store.
  • Featureform - A Virtual Feature Store. Turn your existing data infrastructure into a feature store.
  • Tecton - A fully-managed feature platform built to orchestrate the complete lifecycle of features.
  • Advisor - Open-source implementation of Google Vizier for hyper parameters tuning.
  • Hyperas - A very simple wrapper for convenient hyperparameter optimization.
  • Hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • Katib - Kubernetes-based system for hyperparameter tuning and neural architecture search.
  • KerasTuner - Easy-to-use, scalable hyperparameter optimization framework.
  • Optuna - Open source hyperparameter optimization framework to automate hyperparameter search.
  • Scikit Optimize - Simple and efficient library to minimize expensive and noisy black-box functions.
  • Talos - Hyperparameter Optimization for TensorFlow, Keras and PyTorch.
  • Tune - Python library for experiment execution and hyperparameter tuning at any scale.
  • Knowledge Repo - Knowledge sharing platform for data scientists and other technical professions.
  • Kyso - One place for data insights so your entire team can learn from your data.
  • aiWARE - aiWARE helps MLOps teams evaluate, deploy, integrate, scale & monitor ML models.
  • Algorithmia - Securely govern your machine learning operations with a healthy ML lifecycle.
  • Allegro AI - Transform ML/DL research into products. Faster.
  • Bodywork - Deploys machine learning projects developed in Python, to Kubernetes.
  • CNVRG - An end-to-end machine learning platform to build and deploy AI models at scale.
  • DAGsHub - A platform built on open source tools for data, model and pipeline management.
  • Dataiku - Platform democratizing access to data and enabling enterprises to build their own path to AI.
  • DataRobot - AI platform that democratizes data science and automates the end-to-end ML at scale.
  • Domino - One place for your data science tools, apps, results, models, and knowledge.
  • Edge Impulse - Platform for creating, optimizing, and deploying AI/ML algorithms for edge devices.
  • envd - Machine learning development environment for data science and AI/ML engineering teams.
  • FedML - Simplifies the workflow of federated learning anywhere at any scale.
  • Gradient - Multicloud CI/CD and MLOps platform for machine learning teams.
  • H2O - Open source leader in AI with a mission to democratize AI for everyone.
  • Hopsworks - Open-source platform for developing and operating machine learning models at scale.
  • Iguazio - Data science platform that automates MLOps with end-to-end machine learning pipelines.
  • Katonic - Automate your cycle of intelligence with Katonic MLOps Platform.
  • Knime - Create and productionize data science using one easy and intuitive environment.
  • Kubeflow - Making deployments of ML workflows on Kubernetes simple, portable and scalable.
  • LynxKite - A complete graph data science platform for very large graphs and other datasets.
  • ML Workspace - All-in-one web-based IDE specialized for machine learning and data science.
  • MLReef - Open source MLOps platform that helps you collaborate, reproduce and share your ML work.
  • Modzy - Deploy, connect, run, and monitor machine learning (ML) models in the enterprise and at the edge.
  • Neu.ro - MLOps platform that integrates open-source and proprietary tools into client-oriented systems.
  • Omnimizer - Simplifies and accelerates MLOps by bridging the gap between ML models and edge hardware.
  • Pachyderm - Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise.
  • Polyaxon - A platform for reproducible and scalable machine learning and deep learning on kubernetes.
  • Sagemaker - Fully managed service that provides the ability to build, train, and deploy ML models quickly.
  • SAS Viya - Cloud native AI, analytic and data management platform that supports the analytics life cycle.
  • Sematic - An open-source end-to-end pipelining tool to go from laptop prototype to cloud in no time.
  • SigOpt - A platform that makes it easy to track runs, visualize training, and scale hyperparameter tuning.
  • TrueFoundry - A Cloud-native MLOps Platform over Kubernetes to simplify training and serving of ML Models.
  • Valohai - Takes you from POC to production while managing the whole model lifecycle.
  • AIF360 - A comprehensive set of fairness metrics for datasets and machine learning models.
  • Fairlearn - A Python package to assess and improve fairness of machine learning models.
  • Opacus - A library that enables training PyTorch models with differential privacy.
  • TensorFlow Privacy - Library for training machine learning models with privacy for training data.
  • Alibi - Open-source Python library enabling ML model inspection and interpretation.
  • Captum - Model interpretability and understanding library for PyTorch.
  • ELI5 - Python package which helps to debug machine learning classifiers and explain their predictions.
  • InterpretML - A toolkit to help understand models and enable responsible machine learning.
  • LIME - Explaining the predictions of any machine learning classifier.
  • Lucid - Collection of infrastructure and tools for research in neural network interpretability.
  • SAGE - For calculating global feature importance using Shapley values.
  • SHAP - A game theoretic approach to explain the output of any machine learning model.
  • Aim - A super-easy way to record, search and compare 1000s of ML training runs.
  • Cascade - Library of ML-Engineering tools for rapid prototyping and experiment management.
  • Comet - Track your datasets, code changes, experimentation history, and models.
  • Guild AI - Open source experiment tracking, pipeline automation, and hyperparameter tuning.
  • Keepsake - Version control for machine learning with support to Amazon S3 and Google Cloud Storage.
  • Losswise - Makes it easy to track the progress of a machine learning project.
  • Mlflow - Open source platform for the machine learning lifecycle.
  • ModelDB - Open source ML model versioning, metadata, and experiment management.
  • Neptune AI - The most lightweight experiment management tool that fits any workflow.
  • Sacred - A tool to help you configure, organize, log and reproduce experiments.
  • Weights and Biases - A tool for visualizing and tracking your machine learning experiments.
  • Banana - Host your ML inference code on serverless GPUs and integrate it into your app with one line of code.
  • Beam - Develop on serverless GPUs, deploy highly performant APIs, and rapidly prototype ML models.
  • BentoML - Open-source platform for high-performance ML model serving.
  • BudgetML - Deploy a ML inference service on a budget in less than 10 lines of code.
  • Cog - Open-source tool that lets you package ML models in a standard, production-ready container.
  • Cortex - Machine learning model serving infrastructure.
  • Geniusrise - Host inference APIs, bulk inference and fine tune text, vision, audio and multi-modal models.
  • Gradio - Create customizable UI components around your models.
  • GraphPipe - Machine learning model deployment made simple.
  • Hydrosphere - Platform for deploying your Machine Learning to production.
  • KFServing - Kubernetes custom resource definition for serving ML models on arbitrary frameworks.
  • LocalAI - Drop-in replacement REST API that’s compatible with OpenAI API specifications for inferencing.
  • Merlin - A platform for deploying and serving machine learning models.
  • MLEM - Version and deploy your ML models following GitOps principles.
  • Opyrator - Turns your ML code into microservices with web API, interactive GUI, and more.
  • PredictionIO - Event collection, deployment of algorithms, evaluation, querying predictive results via APIs.
  • Quix - Serverless platform for processing data streams in real-time with machine learning models.
  • Rune - Provides containers to encapsulate and deploy EdgeML pipelines and applications.
  • Seldon - Take your ML projects from POC to production with maximum efficiency and minimal risk.
  • Streamlit - Lets you create apps for your ML projects with deceptively simple Python scripts.
  • TensorFlow Serving - Flexible, high-performance serving system for ML models, designed for production.
  • TorchServe - A flexible and easy to use tool for serving PyTorch models.
  • Triton Inference Server - Provides an optimized cloud and edge inferencing solution.
  • Vespa - Store, search, organize and make machine-learned inferences over big data at serving time.
  • Deepchecks - Open-source package for validating ML models & data, with various checks and suites.
  • Starwhale - An MLOps/LLMOps platform for model building, evaluation, and fine-tuning.
  • Trubrics - Validate machine learning with data science and domain expert feedback.
  • Accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
  • Dask - Provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
  • DeepSpeed - Deep learning optimization library that makes distributed training easy, efficient, and effective.
  • Fiber - Python distributed computing library for modern computer clusters.
  • Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
  • Mahout - Distributed linear algebra framework and mathematically expressive Scala DSL.
  • MLlib - Apache Spark's scalable machine learning library.
  • Modin - Speed up your Pandas workflows by changing a single line of code.
  • Nebullvm - Easy-to-use library to boost AI inference.
  • Nos - Open-source module for running AI workloads on Kubernetes in an optimized way.
  • Petastorm - Enables single machine or distributed training and evaluation of deep learning models.
  • Rapids - Gives the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.
  • Ray - Fast and simple framework for building and running distributed applications.
  • Singa - Apache top level project, focusing on distributed training of DL and ML models.
  • Tpot - Automated ML tool that optimizes machine learning pipelines using genetic programming.
  • Chassis - Turns models into ML-friendly containers that run just about anywhere.
  • Hermione - Help Data Scientists on setting up more organized codes, in a quicker and simpler way.
  • Hydra - A framework for elegantly configuring complex applications.
  • Koalas - Pandas API on Apache Spark. Makes data scientists more productive when interacting with big data.
  • Ludwig - Allows users to train and test deep learning models without the need to write code.
  • MLNotify - No need to keep checking your training, just one import line and you'll know the second it's done.
  • PyCaret - Open source, low-code machine learning library in Python.
  • Sagify - A CLI utility to train and deploy ML/DL models on AWS SageMaker.
  • Soopervisor - Export ML projects to Kubernetes (Argo workflows), Airflow, AWS Batch, and SLURM.
  • Soorgeon - Convert monolithic Jupyter notebooks into maintainable pipelines.
  • TrainGenerator - A web app to generate template code for machine learning.
  • Turi Create - Simplifies the development of custom machine learning models.
  • Aporia - Observability with customized monitoring and explainability for ML models.
  • Arize - A free end-to-end ML observability and model monitoring platform.
  • CometLLM - Track, visualize, and evaluate your LLM prompts and chains in one easy-to-use UI.
  • Evidently - Interactive reports to analyze ML models during validation or production monitoring.
  • Fiddler - Monitor, explain, and analyze your AI in production.
  • Manifold - A model-agnostic visual debugging tool for machine learning.
  • NannyML - Algorithm capable of fully capturing the impact of data drift on performance.
  • Netron - Visualizer for neural network, deep learning, and machine learning models.
  • Phoenix - MLOps in a Notebook for troubleshooting and fine-tuning generative LLM, CV, and tabular models.
  • Superwise - Fully automated, enterprise-grade model observability in a self-service SaaS platform.
  • Whylogs - The open source standard for data logging. Enables ML monitoring and observability.
  • Yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.
  • Argo - Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
  • Automate Studio - Rapidly build & deploy AI-powered workflows.
  • Couler - Unified interface for constructing and managing workflows on different workflow engines.
  • dstack - An open-core tool to automate data and training workflows.
  • Flyte - Easy to create concurrent, scalable, and maintainable workflows for machine learning.
  • Hamilton - A scalable general purpose micro-framework for defining dataflows.
  • Kale - Aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.
  • Kedro - Library that implements software engineering best-practice for data and ML pipelines.
  • Luigi - Python module that helps you build complex pipelines of batch jobs.
  • Metaflow - Human-friendly lib that helps scientists and engineers build and manage data science projects.
  • MLRun - Generic mechanism for data scientists to build, run, and monitor ML tasks and pipelines.
  • Orchest - Visual pipeline editor and workflow orchestrator with an easy to use UI and based on Kubernetes.
  • Ploomber - Write maintainable, production-ready pipelines. Develop locally, deploy to the cloud.
  • Prefect - A workflow management system, designed for modern infrastructure.
  • VDP - An open-source tool to seamlessly integrate AI for unstructured data into the modern data stack.
  • ZenML - An extensible open-source MLOps framework to create reproducible pipelines.
  • A Tour of End-to-End Machine Learning Platforms
  • Continuous Delivery for Machine Learning
  • Delivering on the Vision of MLOps: A maturity-based approach
  • Machine Learning Operations (MLOps): Overview, Definition, and Architecture
  • MLOps: Continuous delivery and automation pipelines in machine learning
  • MLOps: Machine Learning as an Engineering Discipline
  • Rules of Machine Learning: Best Practices for ML Engineering
  • The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
  • What Is MLOps?
  • Beginning MLOps with MLFlow
  • Building Machine Learning Pipelines
  • Building Machine Learning Powered Applications
  • Deep Learning in Production
  • Designing Machine Learning Systems
  • Engineering MLOps
  • Implementing MLOps in the Enterprise
  • Introducing MLOps
  • Kubeflow for Machine Learning
  • Kubeflow Operations Guide
  • Machine Learning Design Patterns
  • Machine Learning Engineering in Action
  • ML Ops: Operationalizing Data Science
  • MLOps Engineering at Scale
  • MLOps Lifecycle Toolkit
  • Practical Deep Learning at Scale with MLflow
  • Practical MLOps
  • Production-Ready Applied Deep Learning
  • Reliable Machine Learning
  • The Machine Learning Solutions Architect Handbook
  • apply() - The ML data engineering conference
  • MLOps Conference - Keynotes and Panels
  • MLOps World: Machine Learning in Production Conference
  • NormConf - The Normcore Tech Conference
  • Stanford MLSys Seminar Series
  • Applied ML
  • Awesome AutoML Papers
  • Awesome AutoML
  • Awesome Data Science
  • Awesome DataOps
  • Awesome Deep Learning
  • Awesome Game Datasets
  • Awesome Machine Learning
  • Awesome MLOps
  • Awesome Production Machine Learning
  • Awesome Python
  • Deep Learning in Production
  • How AI Built This
  • Kubernetes Podcast from Google
  • Machine Learning – Software Engineering Daily
  • MLOps.community
  • Pipeline Conversation
  • Practical AI: Machine Learning, Data Science
  • This Week in Machine Learning & AI
  • True ML Talks
  • Kubeflow Workspace
  • MLOps Community Wokspace
  • Feature Stores for ML
  • Made with ML
  • ML-Ops
  • MLOps Community
  • MLOps Guide
  • MLOps Now