An open API service indexing awesome lists of open source software.

awesome-python-machine-learning

Curated list of Awesome Python Machine Learning frameworks, libraries, tools, etc.
https://github.com/sorend/awesome-python-machine-learning

Last synced: 2 days ago
JSON representation

  • Uncategorized

    • Uncategorized

      • PySpark - Spark is a fast and general cluster computing system for Big Data.
      • pyqlearning - pyqlearning is Python library to implement Reinforcement Learning and Deep Reinforcement Learning.
      • deap - DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas.
      • netron - Netron is a viewer for neural network, deep learning and machine learning models.
      • Matplotlib - Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.
      • TensorFlow - TensorFlow is an open source software library for numerical computation using data flow graphs.
      • folium - Python Data. Leaflet.js Maps.
      • Blaze - Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems.
      • Chainer - Chainer is a Python-based deep learning framework aiming at flexibility.
      • Keras - Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
      • Seaborn - Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
      • Lasagne - Lasagne is a lightweight library to build and train neural networks in Theano.
      • Bokeh - Bokeh is an interactive visualization library for Python that enables beautiful and meaningful visual presentation of data in modern web browsers. With Bokeh, you can quickly and easily create interactive plots, dashboards, and data applications.
      • scikit-fuzzy - scikit-fuzzy is a fuzzy logic toolkit for SciPy.
      • CatBoost - CatBoost is a machine learning method based on gradient boosting over decision trees.
      • Scikit-Learn - A general purpose ML library. Most common algorithms and metrics implemented.
      • sympy - A Python library for symbolic mathematics.
      • Caffe - Caffe is a deep learning framework made with expression, speed, and modularity in mind.
      • auto-sklearn - auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
      • keras-rl - Deep Reinforcement Learning for Keras.
      • tensorforce - Tensorforce: a TensorFlow library for applied reinforcement learning.
      • TPOT - Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
      • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
      • scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects.
      • NumPy - NumPy is the fundamental package needed for scientific computing with Python.
      • H2O - H2O is an in-memory platform for distributed, scalable machine learning.
      • Annoy - Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point.
      • Prophet - Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
      • seglearn - Python module for machine learning time series.
      • fastText - Library for fast text representation and classification.
      • pomegranate - pomegranate is a package for probabilistic models in Python that is implemented in cython for speed.
      • tsfresh - Automatic extraction of relevant features from time series.
      • MLPACK - mlpack: a scalable C++ machine learning library (with Python bindings)
      • xLearn - xLearn is a high performance, easy-to-use, and scalable machine learning package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM), which can be used to solve large-scale machine learning problems.
      • Imbalanced-learn - imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance.
      • mlxtend - Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.
      • PyOD - PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data.
      • SKLL - This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn.
      • SciPy - SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.
      • Statsmodels - Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.
      • pgmpy - pgmpy is a python library for working with Probabilistic Graphical Models.
      • Surprise - Surprise is a Python scikit building and analyzing recommender systems.
      • Numba - A Just-In-Time Compiler for Numerical Functions in Python.
      • plotly.py - plotly.py is an interactive, open-source, and browser-based graphing library for Python.
      • boto3 - AWS SDK for Python.
      • Shogun - The SHOGUN machine learning toolbox. Unified and efficient Machine Learning since 1999.
      • missingno - Missing data visualization module for Python.
      • Turi Create - Turi Create simplifies the development of custom machine learning models. You don't have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
      • XGBoost - XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.
      • Lime - Lime: Explaining the predictions of any machine learning classifier.
      • Pandas - pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.
      • loguru - Python logging made (stupidly) simple.
      • sklearn-pandas - Pandas integration with sklearn.
      • tflearn - TFlearn is a modular and transparent deep learning library built on top of Tensorflow.
      • mindsdb - MindsDB's goal is to make it very simple for developers to use the power of artificial neural networks in their projects.
      • Modin - Modin: Speed up your Pandas workflows by changing a single line of code.
      • plotnine - plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2.
      • dfply - The dfply package makes it possible to do R's dplyr-style data manipulation with pipes in python on pandas DataFrames.
      • Jupyter Notebook - A rich explorative data analysis tool.
      • Mars - Mars is a tensor-based unified framework for large-scale data computation.
      • modAL - modAL is an active learning framework for Python3, designed with modularity, flexibility and extensibility in mind.
      • Great Expectations - Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests.
      • Regularized Greedy Forest - Regularized Greedy Forest (RGF) is a tree ensemble machine learning method
      • Determined - Deep learning training platform with integrated support for distributed training, hyperparameter tuning, smart GPU scheduling, experiment tracking, and a model registry.
      • Edward - Edward is a Python library for probabilistic modeling, inference, and criticism.
      • neonrvm - neonrvm is an experimental open source machine learning library for performing regression tasks using RVM technique.
      • Orange - Orange is a component-based data mining software. It includes a range of data visualization, exploration, preprocessing and modeling techniques.
      • scikit-multilearn - A scikit-learn based module for multi-label et. al. classification.
      • scikit-posthocs - Pairwise multiple comparisons (post hoc) tests in Python.
      • Cufflinks - This library binds the power of plotly with the flexibility of pandas for easy plotting.
      • nilearn - Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data.
      • petl - Python Extract Transform and Load Tables of Data
      • chainerrl - ChainerRL is a deep reinforcement learning library built on top of Chainer.
      • mushroom-rl - Python library for Reinforcement Learning experiments.
      • Vispy - VisPy is a high-performance interactive 2D/3D data visualization library.
      • impyute - Impyute is a library of missing data imputation algorithms.
      • ml-ens - A Python library for high performance ensemble learning.
      • pypeln - Concurrent data pipelines made easy.
      • auto_ml - Automated machine learning for production and analytics.
      • Hyperopt-sklearn - Hyper-parameter optimization for sklearn.
      • CNTK - The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes neural networks as a series of computational steps via a directed graph.
      • pdvega - Interactive plotting for Pandas using Vega-Lite.
      • eli5 - ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions.
      • LOFO - LOFO (Leave One Feature Out) Importance calculates the importances of a set of features.
      • Dask - Dask-ML provides scalable machine learning in Python using Dask alongside popular machine learning libraries like Scikit-Learn.
      • LightGBM - LightGBM is a gradient boosting framework that uses tree based learning algorithms.
      • SimpleAI - This lib implements many of the artificial intelligence algorithms described on the book "Artificial Intelligence, a Modern Approach", from Stuart Russel and Peter Norvig.
      • astroML - Machine learning, statistics, and data mining for astronomy and astrophysics.
      • neuropredict - Easy and comprehensive assessment of predictive power, with support for neuroimaging features.
      • pyhsmm - This is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
      • neurolab - Neurolab is a simple and powerful Neural Network Library for Python. Contains based neural networks, train algorithms and flexible framework to create and explore other neural network types.
      • fylearn - FyLearn is a fuzzy machine learning library, built on top of SciKit-Learn.
      • fuku-ml - Simple machine learning library.
      • stacked_generalization - Library for machine learning stacking generalization.
      • pycobra - python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.
      • skits - A library for SciKit-learn-Inspired Time Series models.
      • pyflux - Open source time series library for Python.
      • botflow - Python Fast Dataflow programming framework for Data pipeline work.
      • pandera - Validating pandas data structures for people seeking correct things.
      • PandasSchema - A validation library for Pandas data frames using user-friendly schemas.
      • engarde - A library for defensive data analysis.
      • scikit-datasets - Scikit-learn-compatible datasets.
      • Chartpy - Easy to use Python API wrapper to plot charts with matplotlib, plotly, bokeh and more.
      • pycm - Multi-class confusion matrix library in Python.
      • Altair-Catplot - Utility to generate plots with categorical variables using Altair.
      • jmpy - Quick plotting and data visualization of pandas and numpy data.
      • Yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.
      • PrettyPandas - PrettyPandas is a Pandas DataFrame Styler class that helps you create report quality tables with a simple API.
      • PennAI - PennAI is an easy-to-use data science assistant. It allows researchers without machine learning or coding expertise to run supervised machine learning analysis through a clean web interface.
      • BigML Python Bindings - These BigML Python bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and, predictions).
      • python-timbl - python-timbl is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
      • thampi - thampi creates a machine learning prediction server on AWS Lambda.
      • PyStan - PyStan provides a Python interface to Stan, a package for Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo.
      • Stairs - Framework which helps you make parallel/distributed calculations using data pipelines.
      • pm4py - PM4Py is a python library that supports (state-of-the-art) process mining algorithms in python.
      • NuPIC - The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex.
      • xlwings - xlwings is a BSD-licensed Python library that makes it easy to call Python from Excel and vice versa.
      • pmdarima - A package that brings R's beloved auto.arima to Python, making an even stronger case for why Python > R for data science.
      • SHAP - SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model.
      • pendulum - Python datetimes made easy.
      • Apache MXNET - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
      • Optimus - Optimus is the missing framework to profile, clean, process and do ML in a distributed fashion using Apache Spark(PySpark).
      • pymc3 - PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms.
      • Apache Singa - Distributed deep learning system.
      • pyqlearning - pyqlearning is Python library to implement Reinforcement Learning and Deep Reinforcement Learning.
      • PandasSchema - A validation library for Pandas data frames using user-friendly schemas.
      • engarde - A library for defensive data analysis.
      • PennAI - PennAI is an easy-to-use data science assistant. It allows researchers without machine learning or coding expertise to run supervised machine learning analysis through a clean web interface.
      • pm4py - PM4Py is a python library that supports (state-of-the-art) process mining algorithms in python.