awesome-python-data-science
From gitlab
https://github.com/jacob98415/awesome-python-data-science
Last synced: 5 minutes ago
JSON representation
-
Deployment
-
NLP
- fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
- gradio - Create UIs for your machine learning model in Python in 3 minutes.
- binder - Enable sharing and execute Jupyter Notebooks
- streamlit - Make it easy to deploy the machine learning model
- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
-
-
Visualization
-
Map
- folium - Makes it easy to visualize data on an interactive open street map
-
General Purposes
- Matplotlib - Plotting with Python.
- seaborn - Statistical data visualization using matplotlib.
- prettyplotlib - Painlessly create beautiful matplotlib plots.
- python-ternary - Ternary plotting library for Python with matplotlib.
- missingno - Missing data visualization module for Python.
- physt - Improved histograms.
- chartify - Python library that makes it easy for data scientists to create charts.
-
Interactive plots
- animatplot - A python package for animating plots built on matplotlib.
- Bokeh - Interactive Web Plotting for Python.
- bqplot - Plotting library for IPython/Jupyter notebooks
- pyecharts - Migrated from [Echarts](https://github.com/apache/echarts), a charting and visualization library, to Python's interactive visual drawing library.<img height="20" src="img/pyecharts.png" alt="pyecharts"> <img height="20" src="img/echarts.png" alt="echarts">
- Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
-
Automatic Plotting
-
NLP
-
-
Natural Language Processing
-
NLP
- gensim - Topic Modelling for Humans.
- NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
- CLTK - The Classical Language Toolkik.
- pyMorfologik - Python binding for <a href="https://github.com/morfologik/morfologik-stemming">Morfologik</a>.
- skift - Scikit-learn wrappers for Python fastText. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Phonemizer - Simple text-to-phonemes converter for multiple languages.
- flair - Very simple framework for state-of-the-art NLP.
- spaCy - Industrial-Strength Natural Language Processing.
-
-
Data Validation
-
NLP
- deepchecks - Validation & testing of ML models and data during model development, deployment, and production. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- evidently - Evaluate and monitor ML models from validation to production.
- TensorFlow Data Validation - Library for exploring and validating machine learning data.
- great_expectations - Always know what to expect from your data.
- pandera - A lightweight, flexible, and expressive statistical data testing library.
-
-
Evaluation
-
NLP
- recmetrics - Library of useful metrics and plots for evaluating recommender systems.
- Metrics - Machine learning evaluation metric.
- sklearn-evaluation - Model evaluation made easy: plots, tables, and markdown reports. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.
-
-
Data Manipulation
-
Synthetic Data
- ydata-synthetic - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
Data Frames
- pandas - Powerful Python data analysis toolkit.
- pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
- cuDF - GPU DataFrame Library. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- blaze - NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- pandasql - Allows you to query pandas DataFrames using SQL syntax. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by [The Alan Turing Institute](https://www.turing.ac.uk/).
- pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
- Arctic - High-performance datastore for time series and tick data.
- datatable - Data.table for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- koalas - pandas API on Apache Spark. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- modin - Speed up your pandas workflows by changing a single line of code. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- swifter - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
- pandas_flavor - A package that allows writing your own flavor of Pandas easily.
- pandas-log - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
- vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
- xarray - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.
- sk-transformer - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- polars - A fast multi-threaded, hybrid-out-of-core DataFrame library.
- pandas-gbq - pandas Google Big Query. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
Pipelines
- pandas-ply - Functional data manipulation for pandas. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- Dplython - Dplyr for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- sklearn-pandas - pandas integration with sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- meza - A Python toolkit for processing tabular data.
- Prodmodel - Build system for data science pipelines.
- dopanda - Hints and tips for using pandas in an analysis environment. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- Hamilton - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.
- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
- Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
-
Data-centric AI
-
-
Feature Engineering
-
General
- Featuretools - Automated feature engineering.
- Feature Engine - Feature engineering package with sklearn-like functionality. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Feature Forge - A set of tools for creating and testing machine learning features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- few - A feature engineering wrapper for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tsfresh - Automatic extraction of relevant features from time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- dirty_cat - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression). <img height="20" src="img/sklearn_big.png" alt="sklearn">
- NitroFE - Moving window features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- skl-groups - A scikit-learn addon to operate on set/"group"-based features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Feature Selection
- scikit-feature - Feature selection repository in Python.
- boruta_py - Implementations of the Boruta all-relevant feature selection method. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- BoostARoota - A fast xgboost feature selection algorithm. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- zoofs - A feature selection library based on evolutionary algorithms.
-
-
Computations
-
NLP
- Dask - Parallel computing with task scheduling. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- CuPy - NumPy-like API accelerated with CUDA.
- scikit-tensor - Python library for multilinear algebra and tensor factorizations.
- numdifftools - Solve automatic numerical differentiation problems in one or more variables.
- quaternion - Add built-in support for quaternions to numpy.
- adaptive - Tools for adaptive and parallel samping of mathematical functions.
- NumExpr - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.
- numpy - The fundamental package needed for scientific computing with Python.
-
-
Machine Learning
-
Ensemble Methods
- ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Stacking - Simple and useful stacking library written in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- stacked_generalization - Library for machine learning stacking generalization. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- vecstack - Python package for stacking (machine learning technique). <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
General Purpose Machine Learning
- scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
- cuML - RAPIDS Machine Learning Library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/spark_big.png" alt="Apache Spark based">
- mlpack - A scalable C++ machine learning library (Python bindings).
- dlib - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
- MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-multilearn - Multi-label classification for python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- seqlearn - Sequence classification toolkit for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pystruct - Simple structured learning framework for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-expertsys - Highly interpretable classifiers for scikit learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- RuleFit - Implementation of the rulefit. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pyGAM - Generalized Additive Models in Python.
- Karate Club - An unsupervised machine learning library for graph-structured data.
- Little Ball of Fur - A library for sampling graph structured data.
- causalml - Uplift modeling and causal inference with machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Shogun - Machine learning toolbox.
- metric-learn - Metric learning algorithms in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Automated Machine Learning
- auto-sklearn - An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- MLBox - A powerful Automated Machine Learning python library.
- AutoGluon - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
- TPOT - Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Imbalanced Datasets
- imbalanced-learn - Module to perform under-sampling and over-sampling with various techniques. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
-
Random Forests
- rpforest - A forest of random projection trees. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-random-bits-forest - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).<img height="20" src="img/sklearn_big.png" alt="sklearn">
- rgf_python - Python Wrapper of Regularized Greedy Forest. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Extreme Learning Machine
- Python-ELM - Extreme Learning Machine implementation in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Python Extreme Learning Machine (ELM) - A machine learning technique used for classification/regression tasks.
- hpelm - High-performance implementation of Extreme Learning Machines (fast randomized neural networks). <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
-
Kernel Methods
- pyFM - Factorization machines in python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- fastFM - A library for Factorization Machines. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tffm - TensorFlow implementation of an arbitrary order Factorization Machine. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
- liquidSVM - An implementation of SVMs.
- scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ThunderSVM - A fast SVM Library on GPUs and CPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
-
Gradient Boosting
- XGBoost - Scalable, Portable, and Distributed Gradient Boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- LightGBM - A fast, distributed, high-performance gradient boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- CatBoost - An open-source gradient boosting on decision trees library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- ThunderGBM - Fast GBDTs and Random Forests on GPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
-
-
Model Explanation
-
NLP
- dalex - moDel Agnostic Language for Exploration and explanation. <img height="20" src="img/sklearn_big.png" alt="sklearn"><img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
- Alibi - Algorithms for monitoring and explaining machine learning models.
- anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
- aequitas - Bias and Fairness Audit Toolkit.
- Contrastive Explanation - Contrastive Explanation (Foil Trees). <img height="20" src="img/sklearn_big.png" alt="sklearn">
- yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
- Lime - Explaining the predictions of any machine learning classifier. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- FairML - FairML is a python toolbox auditing the machine learning models for bias. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- L2X - Code for replicating the experiments in the paper *Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.
- PDPbox - Partial dependence plot toolbox.
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
- model-analysis - Model analysis tools for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- themis-ml - A library that implements fairness-aware machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Auralisation - Auralisation of learned features in CNN (for audio).
- CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
- lucid - A collection of infrastructure and tools for research in neural network interpretability.
- Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
- mxboard - Logging MXNet data for visualization in TensorBoard. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- Skater - Python Library for Model Interpretation.
- AI Explainability 360 - Interpretability and explainability of data and machine learning models.
- FlashLight - Visualization Tool for your NeuralNetwork.
-
-
Reinforcement Learning
-
NLP
- OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
- garage - A toolkit for reproducible reinforcement learning research.
- OpenAI Baselines - High-quality implementations of reinforcement learning algorithms.
- Stable Baselines - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
- TF-Agents - A library for Reinforcement Learning in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
- keras-rl - Deep Reinforcement Learning for Keras. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- ChainerRL - A deep reinforcement learning library built on top of Chainer.
- Coach - Easy experimentation with state-of-the-art Reinforcement Learning algorithms.
- RLlib - Scalable Reinforcement Learning.
- Horizon - A platform for Applied Reinforcement Learning.
-
-
Probabilistic Methods
-
NLP
- pomegranate - Probabilistic and graphical models for Python. <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- PyMC - Bayesian Stochastic Modelling in Python.
- InferPy - Deep Probabilistic Modelling Made Easy. <img height="20" src="img/tf_big2.png" alt="sklearn">
- PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
- sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pgmpy - A python library for working with Probabilistic Graphical Models.
- PtStat - Probabilistic Programming and Statistical Inference in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
- hsmmlearn - A library for hidden semi-Markov models with explicit durations.
- pyhsmm - Bayesian inference in HSMMs and HMMs.
- GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- MXFusion - Modular Probabilistic Programming on MXNet. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- GPflow - Gaussian processes in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by [The Alan Turing Institute](https://www.turing.ac.uk/). <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Genetic Programming
-
NLP
- gplearn - Genetic Programming in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- DEAP - Distributed Evolutionary Algorithms in Python.
- karoo_gp - A Genetic Programming platform for Python with GPU support. <img height="20" src="img/tf_big2.png" alt="sklearn">
- monkeys - A strongly-typed genetic programming framework for Python.
- sklearn-genetic - Genetic feature selection module for scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Optimization
-
NLP
- Optuna - A hyperparameter optimization framework.
- Spearmint - Bayesian optimization.
- BoTorch - Bayesian optimization in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- scikit-opt - Heuristic Algorithms for optimization.
- sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- SMAC3 - Sequential Model-based Algorithm Configuration.
- Optunity - Is a library containing various optimizers for hyperparameter tuning.
- hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
- hyperopt-sklearn - Hyper-parameter optimization for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
- SafeOpt - Safe Bayesian Optimization.
- scikit-optimize - Sequential model-based optimization with a `scipy.optimize` interface.
- Solid - A comprehensive gradient-free optimization framework written in Python.
- PySwarms - A research toolkit for particle swarm optimization in Python.
- Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
- GPflowOpt - Bayesian Optimization using GPflow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Talos - Hyperparameter Optimization for Keras Models.
- nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
- sigopt_sklearn - SigOpt wrappers for scikit-learn methods. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Time Series
-
NLP
- sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- darts - A python library for easy manipulation and forecasting of time series.
- statsforecast - Lightning fast forecasting with statistical and econometric models.
- mlforecast - Scalable machine learning-based time series forecasting.
- neuralforecast - Scalable machine learning-based time series forecasting.
- tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tick - Module for statistical learning, with a particular emphasis on time-dependent modeling. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- greykite - A flexible, intuitive, and fast forecasting library next.
- Prophet - Automatic Forecasting Procedure.
- PyFlux - Open source time series library for Python.
- bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
- luminol - Anomaly Detection and Correlation library.
- maya - makes it very easy to parse a string and for changing timezones
- Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis
- dateutil - Powerful extensions to the standard datetime module
-
-
Deep Learning
-
PyTorch
- PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pytorch-lightning - PyTorch Lightning is just organized PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- torchvision - Datasets, Transforms, and Models specific to Computer Vision. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- torchtext - Data loaders and abstractions for text and NLP. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- torchaudio - An audio library for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- ignite - High-level library to help with training neural networks in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Catalyst - High-level utils for PyTorch DL & RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- ChemicalX - A PyTorch-based deep learning library for drug pair scoring. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
TensorFlow
- TensorFlow - Computation using data flow graphs for scalable machine learning by Google. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TFLearn - Deep learning library featuring a higher-level API for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Sonnet - TensorFlow-based neural network library. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorpack - A Neural Net Training Interface on TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Polyaxon - A platform that helps you build, manage and monitor deep learning models. <img height="20" src="img/tf_big2.png" alt="sklearn">
- NeuPy - NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: <img height="20" src="img/theano_big.png" alt="Theano compatible">). <img height="20" src="img/tf_big2.png" alt="sklearn">
- tfdeploy - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorflow-upstream - TensorFlow ROCm port. <img height="20" src="img/tf_big2.png" alt="sklearn"> <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
- TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorlm - Wrapper library for text generation/language models at char and word level with RNN. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Mesh TensorFlow - Model Parallelism Made Easier. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code. <img height="20" src="img/tf_big2.png" alt="sklearn">
- keras-contrib - Keras community contributions. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Hyperas - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Elephas - Distributed Deep learning with Keras & Spark. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Hera - Train/evaluate a Keras model, and get metrics streamed to a dashboard in your browser. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Spektral - Deep learning on graphs. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- qkeras - A quantization deep learning library. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. <img height="20" src="img/tf_big2.png" alt="sklearn">
-
MXNet
- MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- Gluon - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet). <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- MXbox - Simple, efficient, and flexible vision toolbox for the mxnet framework. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- gluon-cv - Provides implementations of the state-of-the-art deep learning models in computer vision. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- gluon-nlp - NLP made easy. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- Xfer - Transfer Learning library for Deep Neural Networks. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- MXNet - HIP Port of MXNet. <img height="20" src="img/mxnet_big.png" alt="MXNet based"> <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
-
Others
- jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.
- Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
- autograd - Efficiently computes derivatives of numpy code.
- nnabla - Neural Network Libraries by Sony.
- Caffe - A fast open framework for deep learning.
- hipCaffe - The HIP port of Caffe. <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
- DISCONTINUED PROJECTS
- hipCaffe - The HIP port of Caffe. <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
-
-
Web Scraping
-
Others
- Pattern - establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
- twitterscraper
- BeautifulSoup
- Selenium
-
-
Computer Audition
-
NLP
- librosa - Python library for audio and music analysis.
- Yaafe - Audio features extraction.
- aubio - A library for audio and music analysis.
- Essentia - Library for audio and music analysis, description, and synthesis.
- LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
- Marsyas - Music Analysis, Retrieval, and Synthesis for Audio Signals.
- muda - A library for augmenting annotated audio data.
- madmom - Python audio and music signal processing library.
-
-
Computer Vision
-
NLP
- OpenCV - Open Source Computer Vision Library.
- scikit-image - Image Processing SciKit (Toolbox for SciPy).
- imgaug - Image augmentation for machine learning experiments.
- Augmentor - Image augmentation library in Python for machine learning.
- imgaug_extension - Additional augmentations for imgaug.
- albumentations - Fast image augmentation library and easy-to-use wrapper around other libraries.
-
-
Statistics
-
NLP
- statsmodels - Statistical modeling and econometrics in Python.
- stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
- weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
- scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
- Alphalens - Performance analysis of predictive (alpha) stock factors.
-
-
Distributed Computing
-
NLP
- Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Veles - Distributed machine learning platform.
- Jubatus - Framework and Library for Distributed Online Machine Learning.
- DMTK - Microsoft Distributed Machine Learning Toolkit.
- PaddlePaddle - PArallel Distributed Deep LEarning.
- dask-ml - Distributed and parallel machine learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Distributed - Distributed computation in Python.
- PySpark - Exposes the Spark programming model to Python. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
-
-
Experimentation
-
NLP
- mlflow - Open source platform for the machine learning lifecycle.
- dvc - Data Version Control | Git for Data & Models | ML Experiments Management.
- envd - 🏕️ machine learning development environment for data science and AI/ML engineering teams.
- Sacred - A tool to help you configure, organize, log, and reproduce experiments.
- Ax - Adaptive Experimentation Platform. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
-
-
Spatial Analysis
-
Quantum Computing
-
NLP
- qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
- cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
- QML - A Python Toolkit for Quantum Machine Learning.
- PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
-
-
Conversion
-
NLP
- sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.
- ONNX - Open Neural Network Exchange.
- MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.
-
Programming Languages
Categories
Machine Learning
47
Deep Learning
43
Data Manipulation
32
Model Explanation
26
Optimization
21
Visualization
17
Probabilistic Methods
17
Time Series
15
Feature Engineering
14
Reinforcement Learning
13
Computer Audition
8
Computations
8
Natural Language Processing
8
Distributed Computing
8
Experimentation
6
Computer Vision
6
Deployment
5
Statistics
5
Data Validation
5
Genetic Programming
5
Quantum Computing
4
Web Scraping
4
Evaluation
4
Conversion
3
Spatial Analysis
2
License
1
Sub Categories
NLP
171
General Purpose Machine Learning
20
TensorFlow
19
Data Frames
19
Others
12
Pipelines
9
General
9
PyTorch
9
General Purposes
7
MXNet
7
Kernel Methods
6
Ensemble Methods
5
Feature Selection
5
Interactive plots
5
Gradient Boosting
4
Automated Machine Learning
4
Data-centric AI
3
Automatic Plotting
3
Extreme Learning Machine
3
Random Forests
3
Imbalanced Datasets
2
Synthetic Data
1
Map
1
Keywords
machine-learning
103
python
89
deep-learning
47
data-science
43
tensorflow
23
pytorch
22
pandas
18
scikit-learn
17
neural-network
14
keras
13
ml
11
visualization
11
time-series
10
ai
10
numpy
10
statistics
10
computer-vision
9
hyperparameter-optimization
9
optimization
9
data-analysis
8
mxnet
8
c-plus-plus
7
nlp
7
forecasting
7
gpu
7
natural-language-processing
7
xgboost
7
data-visualization
7
reinforcement-learning
7
dataframe
6
dask
6
machinelearning
6
neural-networks
6
interpretability
6
artificial-intelligence
6
automl
6
mlops
6
distributed
6
cuda
6
feature-selection
5
plotting
5
lightgbm
5
pandas-dataframe
5
machine-learning-algorithms
5
automated-machine-learning
5
gbdt
4
ensemble-learning
4
jupyter
4
anomaly-detection
4
gluon
4