An open API service indexing awesome lists of open source software.

awesome-python-data-science

From gitlab
https://github.com/jacob98415/awesome-python-data-science

Last synced: about 11 hours ago
JSON representation

  • Web Scraping

  • Natural Language Processing

    • NLP

      • spaCy - Industrial-Strength Natural Language Processing.
      • gensim - Topic Modelling for Humans.
      • NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
      • CLTK - The Classical Language Toolkik.
      • skift - Scikit-learn wrappers for Python fastText. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Phonemizer - Simple text-to-phonemes converter for multiple languages.
      • flair - Very simple framework for state-of-the-art NLP.
      • pyMorfologik - Python binding for <a href="https://github.com/morfologik/morfologik-stemming">Morfologik</a>.
  • Machine Learning

    • Ensemble Methods

      • ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • stacked_generalization - Library for machine learning stacking generalization. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • vecstack - Python package for stacking (machine learning technique). <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Stacking - Simple and useful stacking library written in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • General Purpose Machine Learning

      • Shogun - Machine learning toolbox.
      • dlib - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
      • mlpack - A scalable C++ machine learning library (Python bindings).
      • xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
      • MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • cuML - RAPIDS Machine Learning Library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • causalml - Uplift modeling and causal inference with machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Karate Club - An unsupervised machine learning library for graph-structured data.
      • scikit-multilearn - Multi-label classification for python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • sklearn-expertsys - Highly interpretable classifiers for scikit learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • seqlearn - Sequence classification toolkit for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • pystruct - Simple structured learning framework for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/spark_big.png" alt="Apache Spark based">
      • RuleFit - Implementation of the rulefit. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • pyGAM - Generalized Additive Models in Python.
      • Little Ball of Fur - A library for sampling graph structured data.
      • Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • metric-learn - Metric learning algorithms in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Gradient Boosting

      • CatBoost - An open-source gradient boosting on decision trees library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • XGBoost - Scalable, Portable, and Distributed Gradient Boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • ThunderGBM - Fast GBDTs and Random Forests on GPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • LightGBM - A fast, distributed, high-performance gradient boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
    • Automated Machine Learning

      • auto-sklearn - An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • MLBox - A powerful Automated Machine Learning python library.
      • AutoGluon - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
    • Imbalanced Datasets

      • imbalanced-learn - Module to perform under-sampling and over-sampling with various techniques. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
    • Kernel Methods

      • liquidSVM - An implementation of SVMs.
      • ThunderSVM - A fast SVM Library on GPUs and CPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • pyFM - Factorization machines in python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • fastFM - A library for Factorization Machines. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • tffm - TensorFlow implementation of an arbitrary order Factorization Machine. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
      • scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Random Forests

      • rpforest - A forest of random projection trees. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • sklearn-random-bits-forest - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).<img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Extreme Learning Machine

      • Python-ELM - Extreme Learning Machine implementation in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Python Extreme Learning Machine (ELM) - A machine learning technique used for classification/regression tasks.
      • hpelm - High-performance implementation of Extreme Learning Machines (fast randomized neural networks). <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
  • Time Series

    • NLP

      • dateutil - Powerful extensions to the standard datetime module
      • luminol - Anomaly Detection and Correlation library.
      • Prophet - Automatic Forecasting Procedure.
      • PyFlux - Open source time series library for Python.
      • Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis
      • darts - A python library for easy manipulation and forecasting of time series.
      • greykite - A flexible, intuitive, and fast forecasting library next.
      • statsforecast - Lightning fast forecasting with statistical and econometric models.
      • mlforecast - Scalable machine learning-based time series forecasting.
      • neuralforecast - Scalable machine learning-based time series forecasting.
      • bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
      • tick - Module for statistical learning, with a particular emphasis on time-dependent modeling. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • maya - makes it very easy to parse a string and for changing timezones
      • sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Optimization

    • NLP

      • OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
      • hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
      • nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
      • Optuna - A hyperparameter optimization framework.
      • sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • scikit-opt - Heuristic Algorithms for optimization.
      • Talos - Hyperparameter Optimization for Keras Models.
      • BoTorch - Bayesian optimization in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • SMAC3 - Sequential Model-based Algorithm Configuration.
      • hyperopt-sklearn - Hyper-parameter optimization for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • scikit-optimize - Sequential model-based optimization with a `scipy.optimize` interface.
      • Spearmint - Bayesian optimization.
      • Optunity - Is a library containing various optimizers for hyperparameter tuning.
      • PySwarms - A research toolkit for particle swarm optimization in Python.
      • Solid - A comprehensive gradient-free optimization framework written in Python.
      • Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
      • sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • sigopt_sklearn - SigOpt wrappers for scikit-learn methods. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • SafeOpt - Safe Bayesian Optimization.
      • Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
      • GPflowOpt - Bayesian Optimization using GPflow. <img height="20" src="img/tf_big2.png" alt="sklearn">
  • Visualization

    • Interactive plots

      • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
      • Bokeh - Interactive Web Plotting for Python.
      • animatplot - A python package for animating plots built on matplotlib.
      • bqplot - Plotting library for IPython/Jupyter notebooks
      • pyecharts - Migrated from [Echarts](https://github.com/apache/echarts), a charting and visualization library, to Python's interactive visual drawing library.<img height="20" src="img/pyecharts.png" alt="pyecharts"> <img height="20" src="img/echarts.png" alt="echarts">
    • Map

      • folium - Makes it easy to visualize data on an interactive open street map
    • General Purposes

      • Matplotlib - Plotting with Python.
      • seaborn - Statistical data visualization using matplotlib.
      • prettyplotlib - Painlessly create beautiful matplotlib plots.
      • python-ternary - Ternary plotting library for Python with matplotlib.
      • missingno - Missing data visualization module for Python.
      • physt - Improved histograms.
    • Automatic Plotting

    • NLP

  • Deployment

    • NLP

      • fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
      • binder - Enable sharing and execute Jupyter Notebooks
      • gradio - Create UIs for your machine learning model in Python in 3 minutes.
      • streamlit - Make it easy to deploy the machine learning model
  • Data Manipulation

    • Data Frames

      • pandas - Powerful Python data analysis toolkit.
      • blaze - NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
      • polars - A fast multi-threaded, hybrid-out-of-core DataFrame library.
      • datatable - Data.table for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
      • modin - Speed up your pandas workflows by changing a single line of code. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • swifter - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
      • koalas - pandas API on Apache Spark. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • cuDF - GPU DataFrame Library. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • xarray - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.
      • xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by [The Alan Turing Institute](https://www.turing.ac.uk/).
      • pandas-log - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
      • pandasql - Allows you to query pandas DataFrames using SQL syntax. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
      • sk-transformer - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
      • Arctic - High-performance datastore for time series and tick data.
      • pandas_flavor - A package that allows writing your own flavor of Pandas easily.
      • pandas-gbq - pandas Google Big Query. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
    • Pipelines

      • SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
      • sklearn-pandas - pandas integration with sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • dopanda - Hints and tips for using pandas in an analysis environment. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • meza - A Python toolkit for processing tabular data.
      • Hamilton - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.
      • pandas-ply - Functional data manipulation for pandas. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • Dplython - Dplyr for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
      • Prodmodel - Build system for data science pipelines.
      • pdpipe - Sasy pipelines for pandas DataFrames.
      • Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
    • Data-centric AI

      • snorkel - A system for quickly generating training data with weak supervision.
      • cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
      • dataprep - Collect, clean, and visualize your data in Python with a few lines of code.
    • Synthetic Data

      • ydata-synthetic - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
  • Deep Learning

    • Others

      • DISCONTINUED PROJECTS
      • autograd - Efficiently computes derivatives of numpy code.
      • Caffe - A fast open framework for deep learning.
      • nnabla - Neural Network Libraries by Sony.
      • Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
      • jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.
      • Myia - Deep Learning framework (pre-alpha).
      • hipCaffe - The HIP port of Caffe. <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
    • TensorFlow

      • TensorFlow - Computation using data flow graphs for scalable machine learning by Google. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • NeuPy - NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: <img height="20" src="img/theano_big.png" alt="Theano compatible">). <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Elephas - Distributed Deep learning with Keras & Spark. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • qkeras - A quantization deep learning library. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • Mesh TensorFlow - Model Parallelism Made Easier. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • TFLearn - Deep learning library featuring a higher-level API for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Polyaxon - A platform that helps you build, manage and monitor deep learning models. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • tfdeploy - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • tensorlm - Wrapper library for text generation/language models at char and word level with RNN. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • keras-contrib - Keras community contributions. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • Hyperas - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • Hera - Train/evaluate a Keras model, and get metrics streamed to a dashboard in your browser. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • Spektral - Deep learning on graphs. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Sonnet - TensorFlow-based neural network library. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • tensorpack - A Neural Net Training Interface on TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • tensorflow-upstream - TensorFlow ROCm port. <img height="20" src="img/tf_big2.png" alt="sklearn"> <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
      • TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • TensorLight - A high-level framework for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
    • PyTorch

      • torchvision - Datasets, Transforms, and Models specific to Computer Vision. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • torchtext - Data loaders and abstractions for text and NLP. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • torchaudio - An audio library for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • ignite - High-level library to help with training neural networks in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • Catalyst - High-level utils for PyTorch DL & RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • ChemicalX - A PyTorch-based deep learning library for drug pair scoring. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • pytorch-lightning - PyTorch Lightning is just organized PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • skorch - A scikit-learn compatible neural network library that wraps PyTorch. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
    • MXNet

      • Gluon - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet). <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • gluon-cv - Provides implementations of the state-of-the-art deep learning models in computer vision. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • gluon-nlp - NLP made easy. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • Xfer - Transfer Learning library for Deep Neural Networks. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • MXbox - Simple, efficient, and flexible vision toolbox for the mxnet framework. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • MXNet - HIP Port of MXNet. <img height="20" src="img/mxnet_big.png" alt="MXNet based"> <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
  • Model Explanation

    • NLP

      • themis-ml - A library that implements fairness-aware machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • FairML - FairML is a python toolbox auditing the machine learning models for bias. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • lucid - A collection of infrastructure and tools for research in neural network interpretability.
      • model-analysis - Model analysis tools for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • dalex - moDel Agnostic Language for Exploration and explanation. <img height="20" src="img/sklearn_big.png" alt="sklearn"><img height="20" src="img/R_big.png" alt="R inspired/ported lib">
      • scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
      • Lime - Explaining the predictions of any machine learning classifier. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
      • aequitas - Bias and Fairness Audit Toolkit.
      • Alibi - Algorithms for monitoring and explaining machine learning models.
      • PDPbox - Partial dependence plot toolbox.
      • anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
      • mxboard - Logging MXNet data for visualization in TensorBoard. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • Contrastive Explanation - Contrastive Explanation (Foil Trees). <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
      • L2X - Code for replicating the experiments in the paper *Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.
      • treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
      • shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Skater - Python Library for Model Interpretation.
      • AI Explainability 360 - Interpretability and explainability of data and machine learning models.
      • Auralisation - Auralisation of learned features in CNN (for audio).
      • CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
      • FlashLight - Visualization Tool for your NeuralNetwork.
  • Computer Vision

    • NLP

      • scikit-image - Image Processing SciKit (Toolbox for SciPy).
      • OpenCV - Open Source Computer Vision Library.
      • imgaug - Image augmentation for machine learning experiments.
      • Augmentor - Image augmentation library in Python for machine learning.
      • imgaug_extension - Additional augmentations for imgaug.
  • Computer Audition

    • NLP

      • librosa - Python library for audio and music analysis.
      • Essentia - Library for audio and music analysis, description, and synthesis.
      • madmom - Python audio and music signal processing library.
      • aubio - A library for audio and music analysis.
      • muda - A library for augmenting annotated audio data.
      • LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
      • Yaafe - Audio features extraction.
      • Marsyas - Music Analysis, Retrieval, and Synthesis for Audio Signals.
  • Probabilistic Methods

    • NLP

      • PyMC - Bayesian Stochastic Modelling in Python.
      • GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • pomegranate - Probabilistic and graphical models for Python. <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • pgmpy - A python library for working with Probabilistic Graphical Models.
      • sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • MXFusion - Modular Probabilistic Programming on MXNet. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • pyhsmm - Bayesian inference in HSMMs and HMMs.
      • PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
      • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
      • PtStat - Probabilistic Programming and Statistical Inference in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • InferPy - Deep Probabilistic Modelling Made Easy. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • GPflow - Gaussian processes in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by [The Alan Turing Institute](https://www.turing.ac.uk/). <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • hsmmlearn - A library for hidden semi-Markov models with explicit durations.
  • Computations

    • NLP

      • NumExpr - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.
      • Dask - Parallel computing with task scheduling. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • CuPy - NumPy-like API accelerated with CUDA.
      • quaternion - Add built-in support for quaternions to numpy.
      • adaptive - Tools for adaptive and parallel samping of mathematical functions.
      • numdifftools - Solve automatic numerical differentiation problems in one or more variables.
      • bottleneck - Fast NumPy array functions written in C.
      • scikit-tensor - Python library for multilinear algebra and tensor factorizations.
  • Reinforcement Learning

    • NLP

      • keras-rl - Deep Reinforcement Learning for Keras. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
      • Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
      • garage - A toolkit for reproducible reinforcement learning research.
      • Stable Baselines - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
      • TF-Agents - A library for Reinforcement Learning in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • OpenAI Baselines - High-quality implementations of reinforcement learning algorithms.
      • ChainerRL - A deep reinforcement learning library built on top of Chainer.
      • TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Coach - Easy experimentation with state-of-the-art Reinforcement Learning algorithms.
      • RLlib - Scalable Reinforcement Learning.
      • Horizon - A platform for Applied Reinforcement Learning.
  • Experimentation

    • NLP

      • mlflow - Open source platform for the machine learning lifecycle.
      • dvc - Data Version Control | Git for Data & Models | ML Experiments Management.
      • envd - 🏕️ machine learning development environment for data science and AI/ML engineering teams.
      • Ax - Adaptive Experimentation Platform. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Sacred - A tool to help you configure, organize, log, and reproduce experiments.
  • Feature Engineering

    • General

      • tsfresh - Automatic extraction of relevant features from time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Feature Engine - Feature engineering package with sklearn-like functionality. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • dirty_cat - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression). <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • NitroFE - Moving window features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Feature Forge - A set of tools for creating and testing machine learning features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • few - A feature engineering wrapper for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Featuretools - Automated feature engineering.
      • skl-groups - A scikit-learn addon to operate on set/"group"-based features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Feature Selection

      • scikit-feature - Feature selection repository in Python.
      • scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • zoofs - A feature selection library based on evolutionary algorithms.
      • boruta_py - Implementations of the Boruta all-relevant feature selection method. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • BoostARoota - A fast xgboost feature selection algorithm. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Statistics

    • NLP

      • statsmodels - Statistical modeling and econometrics in Python.
      • Alphalens - Performance analysis of predictive (alpha) stock factors.
      • scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
      • stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
      • weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
  • Conversion

    • NLP

      • ONNX - Open Neural Network Exchange.
      • sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.
      • MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.
  • Data Validation

    • NLP

      • evidently - Evaluate and monitor ML models from validation to production.
      • great_expectations - Always know what to expect from your data.
      • pandera - A lightweight, flexible, and expressive statistical data testing library.
      • deepchecks - Validation & testing of ML models and data during model development, deployment, and production. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • TensorFlow Data Validation - Library for exploring and validating machine learning data.
  • Spatial Analysis

    • NLP

      • PySal - Python Spatial Analysis Library.
      • GeoPandas - Python tools for geographic data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
  • Distributed Computing

    • NLP

      • PaddlePaddle - PArallel Distributed Deep LEarning.
      • Distributed - Distributed computation in Python.
      • dask-ml - Distributed and parallel machine learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Jubatus - Framework and Library for Distributed Online Machine Learning.
      • Veles - Distributed machine learning platform.
      • DMTK - Microsoft Distributed Machine Learning Toolkit.
      • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • PySpark - Exposes the Spark programming model to Python. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
  • Quantum Computing

    • NLP

      • cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
      • qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
      • QML - A Python Toolkit for Quantum Machine Learning.
      • PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
  • Genetic Programming

    • NLP

      • DEAP - Distributed Evolutionary Algorithms in Python.
      • gplearn - Genetic Programming in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • monkeys - A strongly-typed genetic programming framework for Python.
      • sklearn-genetic - Genetic feature selection module for scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • karoo_gp - A Genetic Programming platform for Python with GPU support. <img height="20" src="img/tf_big2.png" alt="sklearn">
  • Evaluation

    • NLP

      • sklearn-evaluation - Model evaluation made easy: plots, tables, and markdown reports. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Metrics - Machine learning evaluation metric.
      • recmetrics - Library of useful metrics and plots for evaluating recommender systems.
      • AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.