awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
https://github.com/eric-erki/awesome-production-machine-learning
Last synced: 12 days ago
JSON representation
-
Data Storage Optimisation
- Apache Arrow - In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Parquet - On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Kafka - Distributed streaming platform framework
- HopsFS - HDFS-compatible file system with scale-out strongly consistent metadata.
- BayesDB - Database that allows for built-in non-parametric Bayesian model discovery and queryingi for data on a database-like interface - [(Video)](https://www.youtube.com/watch?v=2ws84s6iD1o)
- ClickHouse - ClickHouse is an open source column oriented database management system supported by Yandex - [(Video)](https://www.youtube.com/watch?v=zbjub8BQPyE)
- Alluxio - A virtual distributed storage system that bridges the gab between computation frameworks and storage systems.
-
Data Pipeline ETL Frameworks
- Apache Airflow - Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
- Azkaban - Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
- Luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
- Apache Nifi - Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic.
- Genie - Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems
- Neuraxle - A framework for building neat pipelines, providing the right abstractions to chain your data transformation and prediction steps with data streaming, as well as doing hyperparameter searches (AutoML).
-
Computation load distribution frameworks
- Apache Spark MLlib - Apache Spark's scalable machine learning library in Java, Scala, Python and R
- BigDL - Deep learning framework on top of Spark/Hadoop to distribute data and computations across a HDFS system
- Hadoop Open Platform-as-a-service (HOPS) - A multi-tenancy open source framework with RESTful API for data science on Hadoop which enables for Spark, Tensorflow/Keras, it is Python-first, and provides a lot of features
- Horovod - Uber's distributed training framework for TensorFlow, Keras, and PyTorch
- PyWren - Answer the question of the "cloud button" for python function execution. It's a framework that abstracts AWS Lambda to enable data scientists to execute any Python function - [(Video)](https://www.youtube.com/watch?v=OskQytBBdJU)
- NumPyWren - Scientific computing framework build on top of pywren to enable numpy-like distributed computations
- Dask - Distributed parallel processing framework for Pandas and NumPy computations - [(Video)](https://www.youtube.com/watch?v=RA_2qdipVng)
-
Commercial Platforms
- Amazon SageMaker - End-to-end machine learning development and deployment interface where you are able to build notebooks that use EC2 instances as backend, and then can host models exposed on an API
- DataRobot - Automated machine learning platform which enables users to build and deploy machine learning models.
- Dataiku - Collaborative data science platform powering both self-service analytics and the operationalization of machine learning models in production.
- Datatron - Machine Learning Model Governance Platform for all your AI models in production for large Enterprises.
- MLJAR - Platform for rapid prototyping, developing and deploying machine learning models.
- Talend Studio
- Valohai - Machine orchestration, version control and pipeline management for deep learning.
- Logical Clocks Hopsworks - Enterprise version of Hopsworks with a Feature Store and scale-out ML pipeline design and operation.
- MCenter - MLOps platform automates the deployment, ongoing optimization, and governance of machine learning applications in production.
- Spell - Flexible end-to-end MLOps / Machine Learning Platform. [(Video)](https://www.youtube.com/watch?v=J7xo-STHx1k)
- y-hat - Deployment, updating and monitoring of predictive models in multiple languages [(Video)](https://www.youtube.com/watch?v=YiEjaWwzS_w)
- Datmo - Workflow tools for monitoring your deployed models to experiment and optimize models in production.
- Skafos - Skafos platform bridges the gap between data science, devops and engineering; continuous deployment, automation and monitoring.
- MissingLink - MissingLink helps data engineers streamline and automate the entire deep learning lifecycle.
- RiseML - Machine Learning Platform for Kubernetes: RiseML simplifies running machine learning experiments on bare metal and cloud GPU clusters of any size.
- cnvrg.io - An end-to-end platform to manage, build and automate machine learning
- Skafos - Skafos platform bridges the gap between data science, devops and engineering; continuous deployment, automation and monitoring.
- MissingLink - MissingLink helps data engineers streamline and automate the entire deep learning lifecycle.
- Comet.ml - Machine learning experiment management. Free for open source and students [(Video)](https://www.youtube.com/watch?v=xaybRkapeNE)
- Skytree 16.0 - End to end machine learning platform [(Video)](https://www.youtube.com/watch?v=XuCwpnU-F1k)
- Microsoft Azure Machine Learning service - Build, train, and deploy models from the cloud to the edge.
- IBM Watson Machine Learning - Create, train, and deploy self-learning models using an automated, collaborative workflow.
- neptune.ml - community-friendly platform supporting data scientists in creating and sharing machine learning models. Neptune facilitates teamwork, infrastructure management, models comparison and reproducibility.
- SKIL - Software distribution designed to help enterprise IT teams manage, deploy, and retrain machine learning models at scale.
- MissingLink - MissingLink helps data engineers streamline and automate the entire deep learning lifecycle.
-
Data Stream Processing
- Spark Streaming - Micro-batch processing for streams using the apache spark framework as a backend supporting stateful exactly-once semantics
- Kafka Streams - Kafka client library for buliding applications and microservices where the input and output are stored in kafka clusters
- Apache Flink - Open source stream processing framework with powerful stream and batch processing capabilities.
- Faust - Streaming library built on top of Python's Asyncio library using the async kafka client inspired by the kafka streaming library.
- Brooklin - Distributed stream processing framework. It uses Apache Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance, processor isolation, security, and resource management.
-
Adversarial Robustness Libraries
- Nicolas Carlini’s Adversarial ML reading list - not a library, but a curated list of the most important adversarial papers by one of the leading minds in Adversarial ML, Nicholas Carlini. If you want to discover the 10 papers that matter the most - I would start here.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- Robust ML - another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- AdvBox - generate adversarial examples from the command line with 0 coding using PaddlePaddle, PyTorch, Caffe2, MxNet, Keras, and TensorFlow. Includes 10 attacks and also 6 defenses. Used to implement [StealthTshirt](https://github.com/advboxes/AdvBox/blob/master/applications/StealthTshirt/README.md) at DEFCON!
- Alibi Detect - detect.svg?style=social) - alibi-detect is a Python package focused on outlier, adversarial and concept drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. The outlier detection methods should allow the user to identify global, contextual and collective outliers.
- AdverTorch - library for adversarial attacks / defenses specifically for PyTorch.
- Foolbox - second biggest adversarial library. Has an even longer list of attacks - but no defenses or evaluation metrics. Geared more towards computer vision. Code easier to understand / modify than ART - also better for exploring blackbox attacks on surrogate models.
- Adversarial DNN Playground - Playground.svg?style=social) - think [TensorFlow Playground](https://playground.tensorflow.org/), but for Adversarial Examples! A visualization tool designed for learning and teaching - the attack library is limited in size, but it has a nice front-end to it with buttons you can press!
- Artificial Adversary - adversary.svg?style=social) AirBnB's library to generate text that reads the same to a human but passes adversarial classifiers.
- EvadeML - Zoo.svg?style=social) - benchmarking and visualization tool for adversarial ML maintained by Weilin Xu, a PhD at University of Virginia, working with David Evans. Has a tutorial on re-implementation of one of the most important adversarial defense papers - [feature squeezing](https://arxiv.org/abs/1704.01155) (same team).
- MIA - epfl/mia.svg?style=social) - A library for running membership inference attacks (MIA) against machine learning models.
- TextFool - kulynych/textfool.svg?style=social) - plausible looking adversarial examples for text generation.
- Trickster - epfl/trickster.svg?style=social) - Library and experiments for attacking machine learning in discrete domains using graph search.
- IBM Adversarial Robustness 360 Toolbox (ART) - robustness-toolbox.svg?style=social) - at the time of writing this is the most complete off-the-shelf resource for testing adversarial attacks and defenses. It includes a library of 15 attacks, 10 empirical defenses, and some nice evaluation metrics. Neural networks only.
- DEEPSEC - another systematic tool for attacking and defending deep learning models.
-
Feature Engineering Automation
- Colombus - A scalable framework to perform exploratory feature selection implemented in R
- Featuretools - An open source framework for automated feature engineering
- tsfresh - yonder/tsfresh.svg?style=social) - Automatic extraction of relevant features from time series
- AutoML-GS - yonder/tsfresh.svg?style=social) - Automatic feature and model search with code generation in Python, on top of common data science libraries (tensorflow, sklearn, etc)
- automl - Automated feature engineering, feature/model selection, hyperparam. optimisation
- TPOT - Automation of sklearn pipeline creation (including feature selection, pre-processor, etc)
- Feature Engine - Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models.
- auto-sklearn - sklearn.svg?style=social) - Framework to automate algorithm and hyperparameter tuning for sklearn
- Colombus - A scalable framework to perform exploratory feature selection implemented in R
-
Model serialisation formats
- Java PMML API - Java libraries for consuming and producing PMML files containing models from different frameworks, including:
- Neural Network Exchange Format (NNEF) - A standard format to store models across Torch, Caffe, TensorFlow, Theano, Chainer, Caffe2, PyTorch, and MXNet
- ONNX - Open Neural Network Exchange Format
- sklearn2pmml
- MMdnn - Cross-framework solution to convert, visualize and diagnose deep neural network models.
- pyspark2pmml
- r2pmml
- sparklyr2pmml
- PFA - Created by the same organisation as PMML, the Predicted Format for Analytics is an emerging standard for statistical models and data transformation engines.
-
Explaining Black Box Models and Datasets
- rationale - Code to implement learning rationales behind predictions with code for paper ["Rationalizing Neural Predictions"](https://github.com/taolei87/rcnn/tree/master/code/rationale)
- fairness - comparison.svg?style=social) - This repository is meant to facilitate the benchmarking of fairness aware machine learning algorithms based on [this paper](https://arxiv.org/abs/1802.04422).
- themis-ml - ml.svg?style=social) - themis-ml is a Python library built on top of pandas and sklearn that implements fairness-aware machine learning algorithms.
- FairML - FairML is a python toolbox auditing the machine learning models for bias.
- Themis - UMASS/Themis.svg?style=social) - Themis is a testing-based approach for measuring discrimination in a software system.
- tensorflow's lucid - Lucid is a collection of infrastructure and tools for research in neural network interpretability.
- tensorflow's Model Analysis - analysis.svg?style=social) - TensorFlow Model Analysis (TFMA) is a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer.
- XAI - eXplainableAI - An eXplainability toolbox for machine learning.
- LIME - Local Interpretable Model-agnostic Explanations for machine learning models.
- captum - model interpretability and understanding library for PyTorch developed by Facebook. It contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models.
- iNNvestigate - An open-source library for analyzing Keras models visually by methods such as [DeepTaylor-Decomposition](https://www.sciencedirect.com/science/article/pii/S0031320316303582), [PatternNet](https://openreview.net/forum?id=Hkn7CBaTW), [Saliency Maps](https://arxiv.org/abs/1312.6034), and [Integrated Gradients](https://arxiv.org/abs/1703.01365).
- Aequitas - An open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive risk-assessment tools.
- DeepVis Toolbox - visualization-toolbox.svg?style=social) - This is the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization. The toolbox and methods are described casually [here](http://yosinski.com/deepvis) and more formally in this [paper](https://arxiv.org/abs/1506.06579).
- Alibi - Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.
- anchor - Code for the paper ["High precision model agnostic explanations"](https://homes.cs.washington.edu/~marcotcr/aaai18.pdf), a model-agnostic system that explains the behaviour of complex models with high-precision rules called anchors.
- casme - Example of using classifier-agnostic saliency map extraction on ImageNet presented on the paper ["Classifier-agnostic saliency map extraction"](https://arxiv.org/abs/1805.08249).
- ContrastiveExplanation (Foil Trees) - Python script for model agnostic contrastive/counterfactual explanations for machine learning. Accompanying code for the paper ["Contrastive Explanations with Local Foil Trees"](https://arxiv.org/abs/1806.07470).
- DeepLIFT - Codebase that contains the methods in the paper ["Learning important features through propagating activation differences"](https://arxiv.org/abs/1704.02685). Here is the [slides](https://docs.google.com/file/d/0B15F_QN41VQXSXRFMzgtS01UOU0/edit?filetype=mspresentation) and the [video](https://vimeo.com/238275076) of the 15 minute talk given at ICML.
- ELI5 - Memex/eli5.svg?style=social) - "Explain Like I'm 5" is a Python package which helps to debug machine learning classifiers and explain their predictions.
- Integrated-Gradients - Gradients.svg?style=social) - This repository provides code for implementing integrated gradients for networks with image inputs.
- L2X - Lab/L2X.svg?style=social) - Code for replicating the experiments in the paper ["Learning to Explain: An Information-Theoretic Perspective on Model Interpretation"](https://arxiv.org/pdf/1802.07814.pdf) at ICML 2018
- LOFO Importance - importance.svg?style=social) - LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.
- pyBreakDown - A model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction.
- responsibly - Toolkit for auditing and mitigating bias and fairness of machine learning systems
- Tensorboard's Tensorboard WhatIf - Tensorboard screen to analyse the interactions between inference results and data inputs.
- TreeInterpreter - Package for interpreting scikit-learn's decision tree and random forest predictions. Allows decomposing each prediction into bias and feature contribution components as described in http://blog.datadive.net/interpreting-random-forests/.
- woe - Tools for WoE Transformation mostly used in ScoreCard Model for credit rating
- IBM AI Fairness 360 - A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
- IBM AI Explainability 360 - Interpretability and explainability of data and machine learning models including a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.
- Microsoft InterpretML - InterpretML is an open-source package for training interpretable models and explaining blackbox systems.
- Skater - Skater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases
- Tensorflow's cleverhans - An adversarial example library for constructing attacks, building defenses, and benchmarking both. A python library to benchmark system's vulnerability to [adversarial examples](http://karpathy.github.io/2015/03/30/breaking-convnets/)
- SHAP - SHapley Additive exPlanations is a unified approach to explain the output of any machine learning model.
-
Industrial Strength Visualisation libraries
- XKCD-style plots - An XKCD theme for matblotlib visualisations
- matplotlib - A Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.
- seaborn - Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- Bokeh - Bokeh is an interactive visualization library for Python that enables beautiful and meaningful visual presentation of data in modern web browsers.
- Streamlit - Streamlit lets you create apps for your machine learning projects with deceptively simple Python scripts. It supports hot-reloading, so your app updates live as you edit and save your file
- Plotly Dash - Dash is a Python framework for building analytical web applications without the need to write javascript.
- yellowbrick - yellowbrick is a matplotlib-based model evaluation plots for scikit-learn and other machine learning libraries.
- Plotly.py - An interactive, open source, and browser-based graphing library for Python.
- Missigno - missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset.
- pygal - pygal is a dynamic SVG charting library written in python
- Pixiedust - PixieDust is a productivity tool for Python or Scala notebooks, which lets a developer encapsulate business logic into something easy for your customers to consume.
- ggplot2 - An implementation of the grammar of graphics for R.
- PDPBox - This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm. (now support all scikit-learn algorithms)
- Geoplotlib - cuttone/geoplotlib.svg?style=social) - geoplotlib is a python toolbox for visualizing geographical data and making maps
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox
- XKCD-style plots - An XKCD theme for matblotlib visualisations
-
Data Science Notebook Frameworks
- H2O Flow - Jupyter notebook-like interface for H2O to create, save and re-use "flows"
- Polynote - Polynote is an experimental polyglot notebook environment. Currently, it supports Scala and Python (with or without Spark), SQL, and Vega.
- Jupyter Notebooks - Web interface python sandbox environments for reproducible development
- Papermill - Papermill is a library for parameterizing notebooks and executing them like Python scripts.
- Voilà - dashboards/voila.svg?style=social) - Voilà turns Jupyter notebooks into standalone web applications that can e.g. be used as dashboards.
- Stencila - Stencila is a platform for creating, collaborating on, and sharing data driven content. Content that is transparent and reproducible.
- RMarkdown - The rmarkdown package is a next generation implementation of R Markdown based on Pandoc.
- H2O Flow - Jupyter notebook-like interface for H2O to create, save and re-use "flows"
-
Data Labelling Tools and Frameworks
- Visual Object Tagging Tool (VOTT) - Microsoft's Open Source electron app for labelling videos and images for object detection models (with active learning functionality)
- Semantic Segmentation Editor - Automotive-And-Industry-Lab/semantic-segmentation-editor.svg?style=social) - Hitachi's Open source tool for labelling camera and LIDAR data.
- PixelAnnotationTool - Image annotation tool with ability to "colour" on the images to select labels for segmentation. Process is semi-automated with the [watershed marked algorithm of OpenCV](docs.opencv.org/3.1.0/d7/d1b/group__imgproc__misc.html#ga3267243e4d3f95165d55a618c65ac6e1)
- ImageTagger - bots/imagetagger.svg?style=social) - Image labelling tool with support for collaboration, supporting bounding box, polygon, line, point labelling, label export, etc.
- OpenLabeling - Open source tool for labelling images with support for labels, edges, as well as image resizing and zooming in.
- Superintendent - superintendent provides an ipywidget-based interactive labelling tool for your data.
- ImgLab - Image annotation tool for bounding boxes with auto-suggestion and extensibility for plugins.
- Label Studio - studio.svg?style=social) - Multi-domain data labeling and annotation tool with standardized output format
- Labelimg - Open source graphical image annotation tool writen in Python using QT for graphical interface focusing primarily on bounding boxes.
- Computer Vision Annotation Tool (CVAT) - OpenCV's web-based annotation tool for both VIDEOS and images for computer algorithms.
- Doccano - works/doccano.svg?style=social) - Open source text annotation tools for humans, providing functionality for sentiment analysis, named entity recognition, and machine translation.
- Labelbox - Open source image labelling tool with support for semantic segmentation (brush & superpixels), bounding boxes and nested classifications.
-
Model and Data Versioning
- Quilt Data - Versioning, reproducibility and deployment of data and models.
- Pachyderm - Open source distributed processing framework build on Kubernetes focused mainly on dynamic building of production machine learning pipelines - [(Video)](https://www.youtube.com/watch?v=LamKVhe2RSM)
- MLflow - Open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment.
- Polyaxon - A platform for reproducible and scalable machine learning and deep learning on kubernetes. - [(Video)](https://www.youtube.com/watch?v=Iexwrka_hys)
- PredictionIO - An open source Machine Learning Server built on top of a state-of-the-art open source stack for developers and data scientists to create predictive engines for any machine learning task
- Catalyst - team/catalyst.svg?style=social) - High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing.
- D6tflow - A python library that allows for building complex data science workflows on Python.
- Sacred - Tool to help you configure, organize, log and reproduce machine learning experiments.
- Apache Marvin - marvin.svg?style=social) is a platform for model deployment and versioning that hides all complexity under the hood: data scientists just need to set up the server and write their code in an extended jupyter notebook.
- FGLab - Machine learning dashboard, designed to make prototyping experiments easier.
- MLWatcher - MLWatcher is a python agent that records a large variety of time-serie metrics of your running ML classification algorithm. It enables you to monitor in real time.
- Studio.ML - Model management framework which minimizes the overhead involved with scheduling, running, monitoring and managing artifacts of your machine learning experiments.
- DAGsHub - The home for data science collaboration. A platform, based on DVC, for data science project management and collaboration.
- steppy - ml/steppy.svg?style=social) - Lightweight, Python3 library for fast and reproducible machine learning experimentation. Introduces simple interface that enables clean machine learning pipeline design.
- ModelChimp - Framework to track and compare all the results and parameters from machine learning models [(Video)](https://vimeo.com/271246650)
- Flor - Easy to use logger and automatic version controller made for data scientists who write ML code
- Kedro - Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned.
- Data Version Control (DVC) - A git fork that allows for version management of models
- ModelDB - Framework to track all the steps in your ML code to keep track of what version of your model obtained which accuracy, and then visualise it and query it via the UI
- TRAINS - Auto-Magical Experiment Manager & Version Control for AI.
-
Privacy Preserving Machine Learning
- Google's Differential Privacy - privacy.svg?style=social) - This is a C++ library of ε-differentially private algorithms, which can be used to produce aggregate statistics over numeric data sets containing private or sensitive information.
- Microsoft SEAL - Microsoft SEAL is an easy-to-use open-source (MIT licensed) homomorphic encryption library developed by the Cryptography Research group at Microsoft.
- Intel Homomorphic Encryption Backend - transformer.svg?style=social) - The Intel HE transformer for nGraph is a Homomorphic Encryption (HE) backend to the Intel nGraph Compiler, Intel's graph compiler for Artificial Neural Networks.
- PySyft - A Python library for secure, private Deep Learning. PySyft decouples private data from model training, using Multi-Party Computation (MPC) within PyTorch.
- TF-Encrypted - encrypted.svg?style=social) - A Python library built on top of TensorFlow for researchers and practitioners to experiment with privacy-preserving machine learning.
- Tensorflow Privacy - A Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy.
- Uber SQL Differencial Privacy - differential-privacy.svg?style=social) - Uber's open source framework that enforces differential privacy for general-purpose SQL queries.
-
Industrial Strength NLP
- 🤗 Transformers - Huggingface's library of state-of-the-art pretrained models for Natural Language Processing (NLP).
- CTRL - A Conditional Transformer Language Model for Controllable Generation released by SalesForce
- OpenAI GPT-2 - 2.svg?style=social) - OpenAI's code from their paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
- Snorkel - team/snorkel.svg?style=social) - Snorkel is a system for quickly generating training data with weak supervision https://snorkel.org.
- SpaCy - Industrial-strength natural language processing library built with python and cython by the explosion.ai team.
- Github's Semantic - Github's text library for parsing, analyzing, and comparing source code across many languages .
- Stable Baselines - a/stable-baselines.svg?style=social) - A fork of OpenAI Baselines, implementations of reinforcement learning algorithms http://stable-baselines.readthedocs.io/.
- Kashgari - Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
- GluonNLP - nlp.svg?style=social) - GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
- sense2vec - A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be "meaning-aware"
- Facebook's XLM - PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc.
- GNES - ai/gnes.svg?style=social) - Generic Neural Elastic Search is a cloud-native semantic search system based on deep neural networks.
- Tensorflow Text - TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.
- Blackstone - Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales' research lab, ICLR&D.
- Grover - Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.
- YouTokenToMe - YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE) [Sennrich et al.].
- Flair - Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch.
- Wav2Letter++ - A speech to text system developed by Facebook's FAIR teams.
-
Model Deployment and Orchestration Frameworks
- Ray - project/ray.svg?style=social) - Ray is a flexible, high-performance distributed execution framework for machine learning ([VIDEO](https://www.youtube.com/watch?v=D_oz7E4v-U0))
- Cortex - Cortex is an open source platform for deploying machine learning models—trained with nearly any framework—as production web services.
- DeepDetect - Machine Learning production server for TensorFlow, XGBoost and Cafe models written in C++ and maintained by Jolibrain
- Clipper - Model server project from Berkeley's Rise Rise Lab which includes a standard RESTful API and supports TensorFlow, Scikit-learn and Caffe models
- Skaffold - Skaffold is a command line tool that facilitates continuous development for Kubernetes applications. You can iterate on your application source code locally then deploy to local or remote Kubernetes clusters.
- NVIDIA TensorRT - TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
- Kubeflow - A cloud native platform for machine learning based on Google’s internal machine learning pipelines.
- Redis-AI - A Redis module for serving tensors and executing deep learning models. Expect changes in the API and internals.
- Seldon - core.svg?style=social) - Open source platform for deploying and monitoring machine learning models in kubernetes - [(Video)](https://www.youtube.com/watch?v=pDlapGtecbY)
- Hopsworks - Hopsworks is a data-intensive platform for the design and operation of machine learning pipelines that includes a Feature Store. [(Video)](https://www.youtube.com/watch?v=v1DrnY8caVU).
- MLeap - Standardisation of pipeline and model serialization for Spark, Tensorflow and sklearn
- Open Platform for AI - Platform that provides complete AI model training and resource management capabilities.
- OpenScoring - REST web service for scoring PMML models built and maintained by OpenScoring.io
- Redis-AI - A Redis module for serving tensors and executing deep learning models. Expect changes in the API and internals.
- Model Server for Apache MXNet (MMS) - model-server.svg?style=social) - A model server for Apache MXNet from Amazon Web Services that is able to run MXNet models as well as Gluon models (Amazon's SageMaker runs a custom version of MMS under the hood)
- KFServing - Serverless framework to deploy and monitor machine learning models in Kubernetes - [(Video)](https://www.youtube.com/watch?v=hGIvlFADMhU)
- Tensorflow Serving - High-performant framework to serve Tensorflow models via grpc protocol able to handle 100k requests per second per core
- NVIDIA TensorRT Inference Server - TensorRT Inference Server is an inference microservice that lets you serve deep learning models in production while maximizing GPU utilization.
- Redis-ML - ml.svg?style=social) - Module available from unstable branch that supports a subset of ML models as Redis data types. (Replaced by Redis AI)
-
Function as a Service Frameworks
- OpenFaaS - Serverless functions framework with RESTful API on Kubernetes
- Hydrosphere ML Lambda - serving.svg?style=social) - Open source model management cluster for deploying, serving and monitoring machine learning models and ad-hoc algorithms with a FaaS architecture
- Fission - (Early Alpha) Serverless functions as a service framework on Kubernetes
- KNative Serving - Kubernetes based serverless microservices with "scale-to-zero" functionality.
- Hydrosphere Mist - Serverless proxy for Apache Spark clusters
- Apache OpenWhisk - openwhisk.svg?style=social) - Open source, distributed serverless platform that executes functions in response to events at any scale.
-
Compiler optimisation frameworks
- Numba - A compiler for Python array and numerical functions
-
Neural Architecture Search
- Neural Architecture Search with Controller RNN - architecture-search.svg?style=social) - Basic implementation of Controller RNN from [Neural Architecture Search with Reinforcement Learning](https://arxiv.org/abs/1611.01578) and [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012).
- ENAS via Parameter Sharing - Efficient Neural Architecture Search via Parameter Sharing by [authors of paper](https://arxiv.org/abs/1802.03268).
- ENAS-PyTorch - pytorch.svg?style=social) - Efficient Neural Architecture Search (ENAS) in PyTorch based [on this paper](https://arxiv.org/abs/1802.03268).
- Neural Network Intelligence - NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments.
- ENAS-Tensorflow - Tensorflow.svg?style=social) - Efficient Neural Architecture search via parameter sharing(ENAS) micro search Tensorflow code for windows user.
- Maggy - Asynchronous, directed Hyperparameter search and parallel ablation studies on Apache Spark [(Video)](https://www.youtube.com/watch?v=0Hd1iYEL03w).
- Autokeras - AutoML library for Keras based on ["Auto-Keras: Efficient Neural Architecture Search with Network Morphism"](https://arxiv.org/abs/1806.10282).
-
Feature Stores
- Veri - Veri is a Feature Label Store. Feature Label store allows storing features as keys and labels as values. Querying values is only possible with knn using features. Veri also supports creating sub sample spaces of data by default.
- Ivory - a1/ivory.svg?style=social) - ivory defines a specification for how to store feature data and provides a set of tools for querying it. It does not provide any tooling for producing feature data in the first place. All ivory commands run as MapReduce jobs so it assumed that feature data is maintained on HDFS.
Programming Languages
Categories
Explaining Black Box Models and Datasets
33
Commercial Platforms
25
Model and Data Versioning
20
Model Deployment and Orchestration Frameworks
19
Industrial Strength NLP
18
Adversarial Robustness Libraries
17
Industrial Strength Visualisation libraries
16
Data Labelling Tools and Frameworks
12
Model serialisation formats
9
Feature Engineering Automation
9
Data Science Notebook Frameworks
8
Data Storage Optimisation
7
Privacy Preserving Machine Learning
7
Computation load distribution frameworks
7
Neural Architecture Search
7
Data Pipeline ETL Frameworks
6
Function as a Service Frameworks
6
Data Stream Processing
5
Feature Stores
2
Compiler optimisation frameworks
1
Sub Categories
Keywords
machine-learning
54
python
37
deep-learning
23
data-science
21
tensorflow
20
pytorch
17
scikit-learn
11
nlp
11
kubernetes
10
jupyter
9
keras
8
hyperparameter-optimization
8
visualization
8
natural-language-processing
8
interpretability
7
ml
6
mlops
6
automl
6
data-visualization
6
serverless
6
jupyter-notebook
6
artificial-intelligence
6
spark
5
big-data
5
docker
5
r
5
ai
5
neural-network
5
data-analysis
4
distributed-systems
4
image-labeling
4
labeling-tool
4
mxnet
4
privacy
4
adversarial-examples
4
adversarial-machine-learning
4
xgboost
4
labeling
4
feature-engineering
4
hyperparameter-tuning
4
notebook
4
fairness
4
text-classification
4
scala
4
reinforcement-learning
4
automated-machine-learning
4
computer-vision
4
hyperparameter-search
4
onnx
3
serving
3