Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-production-machine-learning

https://github.com/GrigoryevSA/awesome-production-machine-learning

Last synced: 6 days ago
JSON representation

Explaining Black Box Models and Datasets
- InterpretML - InterpretML is an open-source package for training interpretable models and explaining blackbox systems.
- rationale - Code to implement learning rationales behind predictions with code for paper ["Rationalizing Neural Predictions"](https://github.com/taolei87/rcnn/tree/master/code/rationale)
- Skater - Skater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases
- Aequitas - An open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive risk-assessment tools.
- Alibi - Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.
- anchor - Code for the paper ["High precision model agnostic explanations"](https://homes.cs.washington.edu/~marcotcr/aaai18.pdf), a model-agnostic system that explains the behaviour of complex models with high-precision rules called anchors.
- captum - model interpretability and understanding library for PyTorch developed by Facebook. It contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models.
- casme - Example of using classifier-agnostic saliency map extraction on ImageNet presented on the paper ["Classifier-agnostic saliency map extraction"](https://arxiv.org/abs/1805.08249).
- ContrastiveExplanation (Foil Trees) - Python script for model agnostic contrastive/counterfactual explanations for machine learning. Accompanying code for the paper ["Contrastive Explanations with Local Foil Trees"](https://arxiv.org/abs/1806.07470).
- DeepLIFT - Codebase that contains the methods in the paper ["Learning important features through propagating activation differences"](https://arxiv.org/abs/1704.02685). Here is the [slides](https://docs.google.com/file/d/0B15F_QN41VQXSXRFMzgtS01UOU0/edit?filetype=mspresentation) and the [video](https://vimeo.com/238275076) of the 15 minute talk given at ICML.
- DeepVis Toolbox - visualization-toolbox.svg?style=social) - This is the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization. The toolbox and methods are described casually [here](http://yosinski.com/deepvis) and more formally in this [paper](https://arxiv.org/abs/1506.06579).
- ELI5 - Memex/eli5.svg?style=social) - "Explain Like I'm 5" is a Python package which helps to debug machine learning classifiers and explain their predictions.
- FACETS - Facets contains two robust visualizations to aid in understanding and analyzing machine learning datasets. Get a sense of the shape of each feature of your dataset using Facets Overview, or explore individual observations using Facets Dive.
- FairML - FairML is a python toolbox auditing the machine learning models for bias.
- fairness - comparison.svg?style=social) - This repository is meant to facilitate the benchmarking of fairness aware machine learning algorithms based on [this paper](https://arxiv.org/abs/1802.04422).
- GEBI - Global Explanations for Bias Identification - An attention-based summarized post-hoc explanations for detection and identification of bias in data. We propose a global explanation and introduce a step-by-step framework on how to detect and test bias. Python package for image data.
- iNNvestigate - An open-source library for analyzing Keras models visually by methods such as [DeepTaylor-Decomposition](https://www.sciencedirect.com/science/article/pii/S0031320316303582), [PatternNet](https://openreview.net/forum?id=Hkn7CBaTW), [Saliency Maps](https://arxiv.org/abs/1312.6034), and [Integrated Gradients](https://arxiv.org/abs/1703.01365).
- Integrated-Gradients - Gradients.svg?style=social) - This repository provides code for implementing integrated gradients for networks with image inputs.
- keras-vis - vis.svg?style=social) - keras-vis is a high-level toolkit for visualizing and debugging your trained keras neural net models. Currently supported visualizations include: Activation maximization, Saliency maps, Class activation maps.
- L2X - Lab/L2X.svg?style=social) - Code for replicating the experiments in the paper ["Learning to Explain: An Information-Theoretic Perspective on Model Interpretation"](https://arxiv.org/pdf/1802.07814.pdf) at ICML 2018
- Lightly - ai/lightly.svg?style=social) - A python framework for self-supervised learning on images. The learned representations can be used to analyze the distribution in unlabeled data and rebalance datasets.
- Lightwood - A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with an objective to build predictive models with one line of code.
- LIME - Local Interpretable Model-agnostic Explanations for machine learning models.
- LOFO Importance - importance.svg?style=social) - LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.
- MindsDB - MindsDB is an Explainable AutoML framework for developers. With MindsDB you can build, train and use state of the art ML models in as simple as one line of code.
- mljar-supervised - supervised.svg?style=social) - An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides feature engineering, explanations and markdown reports.
- NETRON - Viewer for neural network, deep learning and machine learning models.
- pyBreakDown - A model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction.
- responsibly - Toolkit for auditing and mitigating bias and fairness of machine learning systems
- SHAPash - Shapash is a Python library that provides several types of visualization that display explicit labels that everyone can understand.
- Tensorboard's Tensorboard WhatIf - Tensorboard screen to analyse the interactions between inference results and data inputs.
- tensorflow's lucid - Lucid is a collection of infrastructure and tools for research in neural network interpretability.
- tensorflow's Model Analysis - analysis.svg?style=social) - TensorFlow Model Analysis (TFMA) is a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer.
- themis-ml - ml.svg?style=social) - themis-ml is a Python library built on top of pandas and sklearn that implements fairness-aware machine learning algorithms.
- Themis - UMASS/Themis.svg?style=social) - Themis is a testing-based approach for measuring discrimination in a software system.
- TreeInterpreter - Package for interpreting scikit-learn's decision tree and random forest predictions. Allows decomposing each prediction into bias and feature contribution components as described in http://blog.datadive.net/interpreting-random-forests/.
- woe - Tools for WoE Transformation mostly used in ScoreCard Model for credit rating
- XAI - eXplainableAI - An eXplainability toolbox for machine learning.
- SHAP - SHapley Additive exPlanations is a unified approach to explain the output of any machine learning model.
- IBM AI Explainability 360 - Interpretability and explainability of data and machine learning models including a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.
- IBM AI Fairness 360 - A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
Model and Data Versioning
- Flor - Easy to use logger and automatic version controller made for data scientists who write ML code
- TerminusDB - A graph database management system that stores data like git.
- Aim - A super-easy way to record, search and compare AI experiments.
- Apache Marvin - marvin.svg?style=social) is a platform for model deployment and versioning that hides all complexity under the hood: data scientists just need to set up the server and write their code in an extended jupyter notebook.
- Catalyst - team/catalyst.svg?style=social) - High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing.
- D6tflow - A python library that allows for building complex data science workflows on Python.
- Data Version Control (DVC) - A git fork that allows for version management of models.
- FGLab - Machine learning dashboard, designed to make prototyping experiments easier.
- Hangar - py.svg?style=social) - Version control for tensor data, git-like semantics on numerical data with high speed and efficiency.
- lakeFS - Repeatable, atomic and versioned data lake on top of object storage.
- MLflow - Open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment.
- MLWatcher - MLWatcher is a python agent that records a large variety of time-serie metrics of your running ML classification algorithm. It enables you to monitor in real time.
- ModelStore - An open-source Python library that allows you to version, export, and save a machine learning model to your cloud storage provider.
- Pachyderm - Open source distributed processing framework build on Kubernetes focused mainly on dynamic building of production machine learning pipelines - [(Video)](https://www.youtube.com/watch?v=LamKVhe2RSM)
- Polyaxon - A platform for reproducible and scalable machine learning and deep learning on kubernetes. - [(Video)](https://www.youtube.com/watch?v=Iexwrka_hys)
- PredictionIO - An open source Machine Learning Server built on top of a state-of-the-art open source stack for developers and data scientists to create predictive engines for any machine learning task
- Quilt Data - Versioning, reproducibility and deployment of data and models.
- Sacred - Tool to help you configure, organize, log and reproduce machine learning experiments.
- Studio.ML - Model management framework which minimizes the overhead involved with scheduling, running, monitoring and managing artifacts of your machine learning experiments.
- ModelChimp - Framework to track and compare all the results and parameters from machine learning models [(Video)](https://vimeo.com/271246650)
- steppy - ml/steppy.svg?style=social) - Lightweight, Python3 library for fast and reproducible machine learning experimentation. Introduces simple interface that enables clean machine learning pipeline design.
Model Training Orchestration
- CML - Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects.
- PyCaret - low-code library for training and deploying models (scikit-learn, XGBoost, LightGBM, spaCy)
- Determined - ai/determined.svg?style=social) - Deep learning training platform with integrated support for distributed training, hyperparameter tuning, and model management (supports Tensorflow and Pytorch).
- Hopsworks - Hopsworks is a data-intensive platform for the design and operation of machine learning pipelines that includes a Feature Store. [(Video)](https://www.youtube.com/watch?v=v1DrnY8caVU).
- Kubeflow - A cloud native platform for machine learning based on Google’s internal machine learning pipelines.
- MLeap - Standardisation of pipeline and model serialization for Spark, Tensorflow and sklearn
- NVIDIA TensorRT - TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
- Open Platform for AI - Platform that provides complete AI model training and resource management capabilities.
- Redis-ML - ml.svg?style=social) - Module available from unstable branch that supports a subset of ML models as Redis data types. (Replaced by Redis AI)
- Skaffold - Skaffold is a command line tool that facilitates continuous development for Kubernetes applications. You can iterate on your application source code locally then deploy to local or remote Kubernetes clusters.
- Tensorflow Extended (TFX) - Production oriented configuration framework for ML based on TensorFlow, incl. monitoring and model version management.
- TonY - TonY is a framework to natively run deep learning jobs on Apache Hadoop. It currently supports TensorFlow, PyTorch, MXNet and Horovod.
- Redis-ML - ml.svg?style=social) - Module available from unstable branch that supports a subset of ML models as Redis data types. (Replaced by Redis AI)
Model Serving and Monitoring
- TorchServe - TorchServe is a flexible and easy to use tool for serving PyTorch models.
- BentoML - BentoML is an open source framework for high performance ML model serving
- Cortex - Cortex is an open source platform for deploying machine learning models—trained with any framework—as production web services. No DevOps required.
- DeepDetect - Machine Learning production server for TensorFlow, XGBoost and Cafe models written in C++ and maintained by Jolibrain
- Evidently - Evidently helps analyze machine learning models during development, validation, or production monitoring. The tool generates interactive reports from pandas DataFrame.
- ForestFlow - Cloud-native machine learning model server.
- Triton Inference Server - inference-server/server.svg?style=social) - Triton is a high performance open source serving software to deploy AI models from any framework on GPU & CPU while maximizing utilization.
- OpenScoring - REST web service for scoring PMML models built and maintained by OpenScoring.io
- Redis-AI - A Redis module for serving tensors and executing deep learning models. Expect changes in the API and internals.
- Seldon Core - core.svg?style=social) - Open source platform for deploying and monitoring machine learning models in kubernetes - [(Video)](https://www.youtube.com/watch?v=pDlapGtecbY)
- Model Server for Apache MXNet (MMS) - model-server.svg?style=social) - A model server for Apache MXNet from Amazon Web Services that is able to run MXNet models as well as Gluon models (Amazon's SageMaker runs a custom version of MMS under the hood)
- Jina - ai/jina.svg?style=social) - Cloud native search framework that supports to use deep learning/state of the art AI models for search.
- KFServing - Serverless framework to deploy and monitor machine learning models in Kubernetes - [(Video)](https://www.youtube.com/watch?v=hGIvlFADMhU)
Adversarial Robustness Libraries
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- AdvBox - generate adversarial examples from the command line with 0 coding using PaddlePaddle, PyTorch, Caffe2, MxNet, Keras, and TensorFlow. Includes 10 attacks and also 6 defenses. Used to implement [StealthTshirt](https://github.com/advboxes/AdvBox/blob/master/applications/StealthTshirt/README.md) at DEFCON!
- Adversarial DNN Playground - Playground.svg?style=social) - think [TensorFlow Playground](https://playground.tensorflow.org/), but for Adversarial Examples! A visualization tool designed for learning and teaching - the attack library is limited in size, but it has a nice front-end to it with buttons you can press!
- AdverTorch - library for adversarial attacks / defenses specifically for PyTorch.
- Alibi Detect - detect.svg?style=social) - alibi-detect is a Python package focused on outlier, adversarial and concept drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. The outlier detection methods should allow the user to identify global, contextual and collective outliers.
- Artificial Adversary - adversary.svg?style=social) AirBnB's library to generate text that reads the same to a human but passes adversarial classifiers.
- DEEPSEC - another systematic tool for attacking and defending deep learning models.
- EvadeML - Zoo.svg?style=social) - benchmarking and visualization tool for adversarial ML maintained by Weilin Xu, a PhD at University of Virginia, working with David Evans. Has a tutorial on re-implementation of one of the most important adversarial defense papers - [feature squeezing](https://arxiv.org/abs/1704.01155) (same team).
- Foolbox - second biggest adversarial library. Has an even longer list of attacks - but no defenses or evaluation metrics. Geared more towards computer vision. Code easier to understand / modify than ART - also better for exploring blackbox attacks on surrogate models.
- MIA - epfl/mia.svg?style=social) - A library for running membership inference attacks (MIA) against machine learning models.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- TextFool - kulynych/textfool.svg?style=social) - plausible looking adversarial examples for text generation.
- Trickster - epfl/trickster.svg?style=social) - Library and experiments for attacking machine learning in discrete domains using graph search.
- CleverHans - library for testing adversarial attacks / defenses maintained by some of the most important names in adversarial ML, namely Ian Goodfellow (ex-Google Brain, now Apple) and Nicolas Papernot (Google Brain). Comes with some nice tutorials!
- Nicolas Carlini’s Adversarial ML reading list - not a library, but a curated list of the most important adversarial papers by one of the leading minds in Adversarial ML, Nicholas Carlini. If you want to discover the 10 papers that matter the most - I would start here.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- DEEPSEC - another systematic tool for attacking and defending deep learning models.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
Data Science Notebook Frameworks
- Apache Zeppelin - Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
- Binder - Binder hosts notebooks in an executable environment (for free).
- H2O Flow - Jupyter notebook-like interface for H2O to create, save and re-use "flows"
- Jupyter Notebooks - Web interface python sandbox environments for reproducible development
- ML Workspace - tooling/ml-workspace.svg?style=social) - All-in-one web IDE for machine learning and data science. Combines Jupyter, VS Code, Tensorflow, and many other tools/libraries into one Docker image.
- Papermill - Papermill is a library for parameterizing notebooks and executing them like Python scripts.
- Polynote - Polynote is an experimental polyglot notebook environment. Currently, it supports Scala and Python (with or without Spark), SQL, and Vega.
- RMarkdown - The rmarkdown package is a next generation implementation of R Markdown based on Pandoc.
- Stencila - Stencila is a platform for creating, collaborating on, and sharing data driven content. Content that is transparent and reproducible.
- Voilà - dashboards/voila.svg?style=social) - Voilà turns Jupyter notebooks into standalone web applications that can e.g. be used as dashboards.
- Hydrogen - A plugin for ATOM that enables it to become a jupyter-notebook-like interface that prints the outputs directly in the editor.
Industrial Strength Visualisation libraries
- XKCD-style plots - An XKCD theme for matblotlib visualisations
- Bokeh - Bokeh is an interactive visualization library for Python that enables beautiful and meaningful visual presentation of data in modern web browsers.
- Geoplotlib - cuttone/geoplotlib.svg?style=social) - geoplotlib is a python toolbox for visualizing geographical data and making maps
- ggplot2 - An implementation of the grammar of graphics for R.
- gradio - app/gradio.svg?style=social) - Quickly create and share demos of models - by only writing Python. Debug models interactively in your browser, get feedback from collaborators, and generate public links without deploying anything.
- matplotlib - A Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.
- Missingno - missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset.
- PDPBox - This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm. (now support all scikit-learn algorithms)
- Perspective
- Pixiedust - PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume.
- Plotly Dash - Dash is a Python framework for building analytical web applications without the need to write javascript.
- Plotly.py - An interactive, open source, and browser-based graphing library for Python.
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox
- pygal - pygal is a dynamic SVG charting library written in python
- Redash - Redash is anopen source visualisation framework that is built to allow easy access to big datasets leveraging multiple backends.
- seaborn - Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- Streamlit - Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts. It supports hot-reloading, so your app updates live as you edit and save your file
- yellowbrick - yellowbrick is a matplotlib-based model evaluation plots for scikit-learn and other machine learning libraries.
Industrial Strength NLP
- Wav2Letter++ - A speech to text system developed by Facebook's FAIR teams.
- Blackstone - Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales' research lab, ICLR&D.
- CTRL - A Conditional Transformer Language Model for Controllable Generation released by SalesForce
- Facebook's XLM - PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc.
- Github's Semantic - Github's text library for parsing, analyzing, and comparing source code across many languages .
- GluonNLP - nlp.svg?style=social) - GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
- GNES - ai/gnes.svg?style=social) - Generic Neural Elastic Search is a cloud-native semantic search system based on deep neural networks.
- Grover - Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.
- Kashgari - Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
- OpenAI GPT-2 - 2.svg?style=social) - OpenAI's code from their paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
- sense2vec - A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be "meaning-aware"
- Snorkel - team/snorkel.svg?style=social) - Snorkel is a system for quickly generating training data with weak supervision https://snorkel.org.
- SpaCy - Industrial-strength natural language processing library built with python and cython by the explosion.ai team.
- Stable Baselines - a/stable-baselines.svg?style=social) - A fork of OpenAI Baselines, implementations of reinforcement learning algorithms http://stable-baselines.readthedocs.io/.
- Tensorflow Lingvo - A framework for building neural networks in Tensorflow, particularly sequence models. [Lingvo: A TensorFlow Framework for Sequence Modeling](https://blog.tensorflow.org/2019/02/lingvo-tensorflow-framework-for-sequence-modeling.html).
- Tensorflow Text - TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.
- YouTokenToMe - YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE) [Sennrich et al.].
- 🤗 Transformers - Huggingface's library of state-of-the-art pretrained models for Natural Language Processing (NLP).
Data Pipeline ETL Frameworks
- Apache Airflow - Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
- Azkaban - Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
- Metaflow - A framework for data scientists to easily build and manage real-life data science projects.
- Apache Nifi - Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic.
- Basin - etl/basin.svg?style=social) - Visual programming editor for building Spark and PySpark pipelines
- Bonobo - bonobo/bonobo.svg?style=social) - ETL framework for Python 3.5+ with focus on simple atomic operations working concurrently on rows of data
- Chronos - More of a job scheduler for Mesos than ETL pipeline. [OUTDATED]
- Couler - proj/couler.svg?style=social) - Unified interface for constructing and managing machine learning workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
- Dagster - io/dagster.svg?style=social) - A data orchestrator for machine learning, analytics, and ETL.
- Genie - Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems
- Gokart - Wrapper of the data pipeline Luigi
- Luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
- Neuraxle - A framework for building neat pipelines, providing the right abstractions to chain your data transformation and prediction steps with data streaming, as well as doing hyperparameter searches (AutoML).
Data Storage Optimisation
- Alluxio - A virtual distributed storage system that bridges the gab between computation frameworks and storage systems.
- Apache Arrow - In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Kafka - Distributed streaming platform framework
- Apache Parquet - On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- BayesDB - Database that allows for built-in non-parametric Bayesian model discovery and queryingi for data on a database-like interface - [(Video)](https://www.youtube.com/watch?v=2ws84s6iD1o)
Computation load distribution frameworks
- Apache Spark MLlib - Apache Spark's scalable machine learning library in Java, Scala, Python and R
- BigDL - analytics/BigDL.svg?style=social) - Deep learning framework on top of Spark/Hadoop to distribute data and computations across a HDFS system
- DeepSpeed - A deep learning optimization library (lightweight PyTorch wrapper) that makes distributed training easy, efficient, and effective.
- Hadoop Open Platform-as-a-service (HOPS) - A multi-tenancy open source framework with RESTful API for data science on Hadoop which enables for Spark, Tensorflow/Keras, it is Python-first, and provides a lot of features
- PyWren - Answer the question of the "cloud button" for python function execution. It's a framework that abstracts AWS Lambda to enable data scientists to execute any Python function - [(Video)](https://www.youtube.com/watch?v=OskQytBBdJU)
Model serialisation formats
- Java PMML API - Java libraries for consuming and producing PMML files containing models from different frameworks, including:
- Neural Network Exchange Format (NNEF) - A standard format to store models across Torch, Caffe, TensorFlow, Theano, Chainer, Caffe2, PyTorch, and MXNet
- PFA - Created by the same organisation as PMML, the Predicted Format for Analytics is an emerging standard for statistical models and data transformation engines.
Data Stream Processing
- Kafka Streams - Kafka client library for buliding applications and microservices where the input and output are stored in kafka clusters
- Spark Streaming - Micro-batch processing for streams using the apache spark framework as a backend supporting stateful exactly-once semantics
Feature Engineering Automation
- Colombus - A scalable framework to perform exploratory feature selection implemented in R
- Featuretools - An open source framework for automated feature engineering
Commercial Platforms
- allegro ai Enterprise - Automagical open-source ML & DL experiment manager and ML-Ops solution.
- Amazon SageMaker - End-to-end machine learning development and deployment interface where you are able to build notebooks that use EC2 instances as backend, and then can host models exposed on an API
- bigml - E2E machine learning platform.
- Cubonacci - The Cubonacci platform manages deployment, versioning, infrastructure, monitoring and lineage for you, eliminating risk and minimizing time-to-market.
- D2iQ KUDO for Kubeflow - [Enterprise machine learning platform](https://d2iq.com/blog/kudo-for-kubeflow-the-enterprise-machine-learning-platform) that runs in the cloud, on premises (incl. air-gapped), in hybrid environments, or on the edge; based on Kubeflow and open-source Kubernetes Universal Declarative Operators ([KUDO](https://kudo.dev/)).
- Dataiku - Collaborative data science platform powering both self-service analytics and the operationalization of machine learning models in production.
- DataRobot - Automated machine learning platform which enables users to build and deploy machine learning models.
- Datatron - Machine Learning Model Governance Platform for all your AI models in production for large Enterprises.
- deepsense AIOps - Enhances multi-cloud & data center IT Operations via traffic analysis, risk analysis, anomaly detection, predictive maintenance, root cause analysis, service ticket analysis and event consolidation.
- Deep Cognition Deep Learning Studio - E2E platform for deep learning.
- deepsense Safety - AI-driven solution to increase worksite safety via safety procedure check, thread detection and hazardous zones monitoring.
- deepsense Quality - Automating laborious quality control tasks.
- H2O Driverless AI - Automates key machine learning tasks, delivering automatic feature engineering, model validation, model tuning, model selection and deployment, machine learning interpretability, bring your own recipe, time-series and automatic pipeline generation for model scoring. [(Video)](https://www.youtube.com/watch?v=ZqCoFp3-rGc)
- Iguazio Data Science Platform - Bring your Data Science to life by automating MLOps with end-to-end machine learning pipelines, transforming AI projects into real-world business outcomes, and supporting real-time performance at enterprise scale.
- Labelbox - Image labelling service with support for semantic segmentation (brush & superpixels), bounding boxes and nested classifications.
- Logical Clocks Hopsworks - Enterprise version of Hopsworks with a Feature Store and scale-out ML pipeline design and operation.
- MCenter - MLOps platform automates the deployment, ongoing optimization, and governance of machine learning applications in production.
- MLJAR - Platform for rapid prototyping, developing and deploying machine learning models.
- Prodigy - Active learning-based data annotation. Allows to train a model and pick most 'uncertain' samples for labeling from an unlabeled pool.
- Spell - Flexible end-to-end MLOps / Machine Learning Platform. [(Video)](https://www.youtube.com/watch?v=J7xo-STHx1k)
- SuperAnnotate - A complete set of solutions for image and video annotation and an annotation service with integrated tooling, on-demand narrow expertise in various fields, and a custom neural network, automation, and training models powered by AI.
- Talend Studio
- Valohai - Machine orchestration, version control and pipeline management for deep learning.
- Weights & Biases - Machine learning experiment tracking, dataset versioning, hyperparameter search, visualization, and collaboration
- Superb AI - ML DataOps platform providing various tools to build, label, manage and iterate on training data.
- DAGsHub - Community platform for Open Source ML – Manage experiments, data & models and create collaborative ML projects easily.
Privacy Preserving Machine Learning
- Intel Homomorphic Encryption Backend - transformer.svg?style=social) - The Intel HE transformer for nGraph is a Homomorphic Encryption (HE) backend to the Intel nGraph Compiler, Intel's graph compiler for Artificial Neural Networks.
- Microsoft SEAL - Microsoft SEAL is an easy-to-use open-source (MIT licensed) homomorphic encryption library developed by the Cryptography Research group at Microsoft.
- PySyft - A Python library for secure, private Deep Learning. PySyft decouples private data from model training, using Multi-Party Computation (MPC) within PyTorch.
- Substra - Substra is an open-source framework for privacy-preserving, traceable and collaborative Machine Learning.
- Tensorflow Privacy - A Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy.
- TF Encrypted - encrypted/tf-encrypted.svg?style=social) - A Framework for Confidential Machine Learning on Encrypted Data in TensorFlow.
- Google's Differential Privacy - privacy.svg?style=social) - This is a C++ library of ε-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information.
- Uber SQL Differencial Privacy - differential-privacy.svg?style=social) - Uber's open source framework that enforces differential privacy for general-purpose SQL queries.
Neural Architecture Search
- ENAS via Parameter Sharing - Efficient Neural Architecture Search via Parameter Sharing by [authors of paper](https://arxiv.org/abs/1802.03268).
- ENAS-PyTorch - pytorch.svg?style=social) - Efficient Neural Architecture Search (ENAS) in PyTorch based [on this paper](https://arxiv.org/abs/1802.03268).
- ENAS-Tensorflow - Tensorflow.svg?style=social) - Efficient Neural Architecture search via parameter sharing(ENAS) micro search Tensorflow code for windows user.
- Katib - A Kubernetes-based system for Hyperparameter Tuning and Neural Architecture Search.
- Maggy - Asynchronous, directed Hyperparameter search and parallel ablation studies on Apache Spark [(Video)](https://www.youtube.com/watch?v=0Hd1iYEL03w).
- Neural Architecture Search with Controller RNN - architecture-search.svg?style=social) - Basic implementation of Controller RNN from [Neural Architecture Search with Reinforcement Learning](https://arxiv.org/abs/1611.01578) and [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012).
- Neural Network Intelligence - NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments.
- Autokeras - AutoML library for Keras based on ["Auto-Keras: Efficient Neural Architecture Search with Network Morphism"](https://arxiv.org/abs/1806.10282).

Programming Languages

Python 87 Jupyter Notebook 21 Go 7 C++ 6 Java 6 TypeScript 4 Scala 4 HTML 4 JavaScript 3 R 2

Ecosyste.ms: Awesome

awesome-production-machine-learning

Explaining Black Box Models and Datasets

Model and Data Versioning

Model Training Orchestration

Model Serving and Monitoring

Adversarial Robustness Libraries

Data Science Notebook Frameworks

Industrial Strength Visualisation libraries

Industrial Strength NLP

Data Pipeline ETL Frameworks

Data Storage Optimisation

Computation load distribution frameworks

Model serialisation formats

Data Stream Processing

Feature Engineering Automation

Commercial Platforms

Privacy Preserving Machine Learning

Neural Architecture Search