Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sorend/awesome-python-machine-learning

Curated list of Awesome Python Machine Learning frameworks, libraries, tools, etc.
https://github.com/sorend/awesome-python-machine-learning

List: awesome-python-machine-learning

machine-learning python

Last synced: 6 days ago
JSON representation

Curated list of Awesome Python Machine Learning frameworks, libraries, tools, etc.

Awesome Lists containing this project

README

        

# Awesome Python Machine Learning

A curated list of awesome *active* Python machine learning frameworks, tools, and other related stuff in Python.

This is a living document, if you have any additions, please do not hesitate to make a pull-request with your additions or contact me.

In order to be an *active* library on the list, the framework must have a commit no older than a year.

For a list of machine learning frameworks in more languages please see the excellent list [https://github.com/josephmisiti/awesome-machine-learning]

# Machine learning libraries
- [Scikit-Learn](https://github.com/scikit-learn/scikit-learn) - A general purpose ML library. Most common algorithms and metrics implemented.
- [Dask](https://github.com/dask/dask-ml) - Dask-ML provides scalable machine learning in Python using Dask alongside popular machine learning libraries like Scikit-Learn.
- [XGBoost](https://github.com/dmlc/xgboost) - XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.
- [PyTorch](https://github.com/pytorch/pytorch) - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
- [Metric learn](https://github.com/metric-learn/metric-learn) - A ML library for learning metrics.
- [TensorFlow](https://github.com/tensorflow/tensorflow) - TensorFlow is an open source software library for numerical computation using data flow graphs.
- [Keras](https://github.com/keras-team/keras) - Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
- [Imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance.
- [Caffe](https://github.com/BVLC/caffe) - Caffe is a deep learning framework made with expression, speed, and modularity in mind.
- [Annoy](https://github.com/spotify/annoy) - Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point.
- [PySpark](https://github.com/apache/spark/tree/master/python) - Spark is a fast and general cluster computing system for Big Data.
- [Orange](https://github.com/biolab/orange3) - Orange is a component-based data mining software. It includes a range of data visualization, exploration, preprocessing and modeling techniques.
- [TPOT](https://github.com/EpistasisLab/tpot) - Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
- [pgmpy](https://github.com/pgmpy/pgmpy) - pgmpy is a python library for working with Probabilistic Graphical Models.
- [Apache MXNET](https://github.com/apache/incubator-mxnet) - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
- [Shogun](https://github.com/shogun-toolbox/shogun) - The SHOGUN machine learning toolbox. Unified and efficient Machine Learning since 1999.
- [CNTK](https://github.com/Microsoft/CNTK) - The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.
- [PyOD](https://github.com/yzhao062/pyod) - PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data.
- [LightGBM](https://github.com/Microsoft/LightGBM) - LightGBM is a gradient boosting framework that uses tree based learning algorithms.
- [CatBoost](https://github.com/catboost/catboost) - CatBoost is a machine learning method based on gradient boosting over decision trees.
- [auto_ml](https://github.com/ClimbsRocks/auto_ml) - Automated machine learning for production and analytics.
- [Apache Singa](https://github.com/apache/incubator-singa) - Distributed deep learning system.
- [SimpleAI](https://github.com/simpleai-team/simpleai) - This lib implements many of the artificial intelligence algorithms described on the book "Artificial Intelligence, a Modern Approach", from Stuart Russel and Peter Norvig.
- [astroML](https://github.com/astroML/astroML) - Machine learning, statistics, and data mining for astronomy and astrophysics.
- [Turi Create](https://github.com/apple/turicreate) - Turi Create simplifies the development of custom machine learning models. You don't have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
- [NuPIC](https://github.com/numenta/nupic) - The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex.
- [Lasagne](https://github.com/Lasagne/Lasagne) - Lasagne is a lightweight library to build and train neural networks in Theano.
- [Chainer](https://github.com/chainer/chainer) - Chainer is a Python-based deep learning framework aiming at flexibility.
- [Prophet](https://github.com/facebook/prophet) - Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
- [Surprise](https://github.com/NicolasHug/Surprise) - Surprise is a Python scikit building and analyzing recommender systems.
- [nilearn](https://github.com/nilearn/nilearn) - Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data.
- [neuropredict](https://github.com/raamana/neuropredict) - Easy and comprehensive assessment of predictive power, with support for neuroimaging features.
- [pyhsmm](https://github.com/mattjj/pyhsmm) - This is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
- [SKLL](https://github.com/EducationalTestingService/skll) - This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn.
- [neurolab](https://github.com/zueve/neurolab) - Neurolab is a simple and powerful Neural Network Library for Python. Contains based neural networks, train algorithms and flexible framework to create and explore other neural network types.
- [pomegranate](https://github.com/jmschrei/pomegranate) - pomegranate is a package for probabilistic models in Python that is implemented in cython for speed.
- [deap](https://github.com/deap/deap) - DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas.
- [mlxtend](https://github.com/rasbt/mlxtend) - Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.
- [scikit-fuzzy](https://github.com/scikit-fuzzy/scikit-fuzzy) - scikit-fuzzy is a fuzzy logic toolkit for SciPy.
- [fylearn](https://github.com/sorend/fylearn) - FyLearn is a fuzzy machine learning library, built on top of SciKit-Learn.
- [tflearn](https://github.com/tflearn/tflearn) - TFlearn is a modular and transparent deep learning library built on top of Tensorflow.
- [Regularized Greedy Forest](https://github.com/RGF-team/rgf) - Regularized Greedy Forest (RGF) is a tree ensemble machine learning method
- [fuku-ml](https://github.com/fukuball/fuku-ml) - Simple machine learning library.
- [Edward](https://github.com/blei-lab/edward) - Edward is a Python library for probabilistic modeling, inference, and criticism.
- [stacked_generalization](https://github.com/fukatani/stacked_generalization) - Library for machine learning stacking generalization.
- [modAL](https://github.com/modAL-python/modAL) - modAL is an active learning framework for Python3, designed with modularity, flexibility and extensibility in mind.
- [neonrvm](https://github.com/siavashserver/neonrvm) - neonrvm is an experimental open source machine learning library for performing regression tasks using RVM technique.
- [xLearn](https://github.com/aksnzhy/xlearn) - xLearn is a high performance, easy-to-use, and scalable machine learning package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM), which can be used to solve large-scale machine learning problems.
- [ml-ens](https://github.com/flennerhag/mlens) - A Python library for high performance ensemble learning.
- [mindsdb](https://github.com/mindsdb/mindsdb) - MindsDB's goal is to make it very simple for developers to use the power of artificial neural networks in their projects.
- [Mars](https://github.com/mars-project/mars) - Mars is a tensor-based unified framework for large-scale data computation.
- [Hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn) - Hyper-parameter optimization for sklearn.
- [H2O](https://github.com/h2oai/h2o-3) - H2O is an in-memory platform for distributed, scalable machine learning.
- [seglearn](https://github.com/dmbee/seglearn) - Python module for machine learning time series.
- [pycobra](https://github.com/bhargavvader/pycobra) - python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.
- [scikit-multilearn](https://github.com/scikit-multilearn/scikit-multilearn) - A scikit-learn based module for multi-label et. al. classification.
- [auto-sklearn](https://github.com/automl/auto-sklearn) - auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
- [skits](https://github.com/ethanrosenthal/skits) - A library for SciKit-learn-Inspired Time Series models.
- [tsfresh](https://github.com/blue-yonder/tsfresh) - Automatic extraction of relevant features from time series.
- [pyqlearning](https://github.com/chimera0/accel-brain-code/tree/master/Reinforcement-Learning) - pyqlearning is Python library to implement Reinforcement Learning and Deep Reinforcement Learning.
- [keras-rl](https://github.com/keras-rl/keras-rl) - Deep Reinforcement Learning for Keras.
- [mushroom-rl](https://github.com/MushroomRL/mushroom-rl) - Python library for Reinforcement Learning experiments.
- [chainerrl](https://github.com/chainer/chainerrl) - ChainerRL is a deep reinforcement learning library built on top of Chainer.
- [tensorforce](https://github.com/tensorforce/tensorforce) - Tensorforce: a TensorFlow library for applied reinforcement learning.
- [Determined](https://github.com/determined-ai/determined) - Deep learning training platform with integrated support for distributed training, hyperparameter tuning, smart GPU scheduling, experiment tracking, and a model registry.

# Data processing
- [NumPy](https://github.com/numpy/numpy) - NumPy is the fundamental package needed for scientific computing with Python.
- [Pandas](https://github.com/pandas-dev/pandas) - pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.
- [Modin](https://github.com/modin-project/modin) - Modin: Speed up your Pandas workflows by changing a single line of code.
- [dfply](https://github.com/kieferk/dfply) - The dfply package makes it possible to do R's dplyr-style data manipulation with pipes in python on pandas DataFrames.
- [xlwings](https://github.com/ZoomerAnalytics/xlwings) - xlwings is a BSD-licensed Python library that makes it easy to call Python from Excel and vice versa.
- [pyflux](https://github.com/rjt1990/pyflux) - Open source time series library for Python.
- [petl](https://github.com/petl-developers/petl) - Python Extract Transform and Load Tables of Data
- [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines made easy.
- [botflow](https://github.com/kkyon/botflow) - Python Fast Dataflow programming framework for Data pipeline work.
- [Great Expectations](https://github.com/great-expectations/great_expectations) - Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests.
- [pandera](https://github.com/cosmicBboy/pandera) - Validating pandas data structures for people seeking correct things.
- [pyjanitor](https://github.com/ericmjl/pyjanitor) - Clean APIs for data cleaning. Python implementation of R package Janitor.
- [PandasSchema](https://github.com/TMiguelT/PandasSchema) - A validation library for Pandas data frames using user-friendly schemas.
- [engarde](https://github.com/TomAugspurger/engarde) - A library for defensive data analysis.
- [sklearn-pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Pandas integration with sklearn.
- [Blaze](https://github.com/blaze/blaze) - Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems.
- [scikit-datasets](https://github.com/daviddiazvico/scikit-datasets) - Scikit-learn-compatible datasets.

# Statistics libraries
- [SciPy](https://github.com/scipy/scipy) - SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.
- [Statsmodels](https://github.com/statsmodels/statsmodels) - Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.
- [pymc3](https://github.com/pymc-devs/pymc3) - PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms.
- [sympy](https://github.com/sympy/sympy) - A Python library for symbolic mathematics.
- [pmdarima](https://github.com/tgsmith61591/pmdarima) - A package that brings R's beloved auto.arima to Python, making an even stronger case for why Python > R for data science.
- [scikit-posthocs](https://github.com/maximtrp/scikit-posthocs) - Pairwise multiple comparisons (post hoc) tests in Python.

# Explaining
- [Lime](https://github.com/marcotcr/lime) - Lime: Explaining the predictions of any machine learning classifier.
- [eli5](https://github.com/TeamHG-Memex/eli5) - ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions.
- [SHAP](https://github.com/slundberg/shap) - SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model.
- [LOFO](https://github.com/aerdem4/lofo-importance) - LOFO (Leave One Feature Out) Importance calculates the importances of a set of features.

# Visualisation libraries
- [Matplotlib](https://github.com/matplotlib/matplotlib) - Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.
- [Seaborn](https://github.com/mwaskom/seaborn) - Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- [Bokeh](https://github.com/bokeh/bokeh) - Bokeh is an interactive visualization library for Python that enables beautiful and meaningful visual presentation of data in modern web browsers. With Bokeh, you can quickly and easily create interactive plots, dashboards, and data applications.
- [plotly.py](https://github.com/plotly/plotly.py) - plotly.py is an interactive, open-source, and browser-based graphing library for Python.
- [scikit-plot](https://github.com/reiinakano/scikit-plot) - An intuitive library to add plotting functionality to scikit-learn objects.
- [plotnine](https://github.com/has2k1/plotnine) - plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2.
- [Cufflinks](https://github.com/santosjorge/cufflinks) - This library binds the power of plotly with the flexibility of pandas for easy plotting.
- [Chartpy](https://github.com/cuemacro/chartpy) - Easy to use Python API wrapper to plot charts with matplotlib, plotly, bokeh and more.
- [Vispy](https://github.com/vispy/vispy) - VisPy is a high-performance interactive 2D/3D data visualization library.
- [pycm](https://github.com/sepandhaghighi/pycm) - Multi-class confusion matrix library in Python.
- [Altair-Catplot](https://github.com/justinbois/altair-catplot) - Utility to generate plots with categorical variables using Altair.
- [pdvega](https://github.com/altair-viz/pdvega) - Interactive plotting for Pandas using Vega-Lite.
- [folium](https://github.com/python-visualization/folium) - Python Data. Leaflet.js Maps.
- [jmpy](https://github.com/beltashazzer/jmpy) - Quick plotting and data visualization of pandas and numpy data.
- [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization module for Python.
- [Yellowbrick](https://github.com/districtdatalabs/yellowbrick) - Visual analysis and diagnostic tools to facilitate machine learning model selection.
- [netron](https://github.com/lutzroeder/netron) - Netron is a viewer for neural network, deep learning and machine learning models.
- [PrettyPandas](https://github.com/HHammond/PrettyPandas) - PrettyPandas is a Pandas DataFrame Styler class that helps you create report quality tables with a simple API.

# Text processing/NLP
- [gensim](https://github.com/rare-technologies/gensim) - Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

# Tooling
- [Numba](https://github.com/numba/numba) - A Just-In-Time Compiler for Numerical Functions in Python.
- [Jupyter Notebook](https://github.com/jupyter/notebook) - A rich explorative data analysis tool.
- [boto3](https://github.com/boto/boto3) - AWS SDK for Python.
- [PennAI](https://github.com/EpistasisLab/pennai) - PennAI is an easy-to-use data science assistant. It allows researchers without machine learning or coding expertise to run supervised machine learning analysis through a clean web interface.

# Wrappers
- [BigML Python Bindings](https://github.com/bigmlcom/python) - These BigML Python bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and, predictions).
- [python-timbl](https://github.com/proycon/python-timbl) - python-timbl is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
- [thampi](https://github.com/scoremedia/thampi) - thampi creates a machine learning prediction server on AWS Lambda.
- [MLPACK](https://github.com/mlpack/mlpack) - mlpack: a scalable C++ machine learning library (with Python bindings)
- [PyStan](https://github.com/stan-dev/pystan) - PyStan provides a Python interface to Stan, a package for Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo.

# Unsorted
- [pm4py](https://github.com/pm4py/pm4py-source) - PM4Py is a python library that supports (state-of-the-art) process mining algorithms in python.
- [Optimus](https://github.com/ironmussa/Optimus) - Optimus is the missing framework to profile, clean, process and do ML in a distributed fashion using Apache Spark(PySpark).
- [impyute](https://github.com/eltonlaw/impyute) - Impyute is a library of missing data imputation algorithms.
- [Stairs](https://github.com/electronick1/stairs) - Framework which helps you make parallel/distributed calculations using data pipelines.
- [fastText](https://github.com/facebookresearch/fastText) - Library for fast text representation and classification.
- [pendulum](https://github.com/sdispater/pendulum) - Python datetimes made easy.
- [loguru](https://github.com/Delgan/loguru) - Python logging made (stupidly) simple.