awesome-python-data-science
Probably the best curated list of data science software in Python.
https://github.com/krzjoa/awesome-python-data-science
Last synced: 8 days ago
JSON representation
-
Web Scraping
-
Synthetic Data
- BeautifulSoup
- Selenium
- Pattern - establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
- twitterscraper
-
-
Natural Language Processing
-
Others
- spaCy - Industrial-Strength Natural Language Processing.
- gensim - Topic Modelling for Humans.
- NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
- torchtext - Data loaders and abstractions for text and NLP. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- CLTK - The Classical Language Toolkik.
- skift - Scikit-learn wrappers for Python fastText. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Phonemizer - Simple text-to-phonemes converter for multiple languages.
- flair - Very simple framework for state-of-the-art NLP.
- pyMorfologik - Python binding for <a href="https://github.com/morfologik/morfologik-stemming">Morfologik</a>.
-
-
Computations
-
Synthetic Data
- numpy - The fundamental package needed for scientific computing with Python.
- NumExpr - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.
- Dask - Parallel computing with task scheduling. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- CuPy - NumPy-like API accelerated with CUDA.
- quaternion - Add built-in support for quaternions to numpy.
- adaptive - Tools for adaptive and parallel samping of mathematical functions.
- numdifftools - Solve automatic numerical differentiation problems in one or more variables.
- scikit-tensor - Python library for multilinear algebra and tensor factorizations.
- numpy - The fundamental package needed for scientific computing with Python.
- bottleneck - Fast NumPy array functions written in C.
-
-
Machine Learning
-
General Purpose Machine Learning
- SciPy - Fundamental algorithms for scientific computing in Python
- scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- dlib - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
- mlpack - A scalable C++ machine learning library (Python bindings).
- xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
- MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Shogun - Machine learning toolbox.
- cuML - RAPIDS Machine Learning Library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- PyCaret - An open-source, low-code machine learning library in Python. <img height="20" src="img/R_big.png" alt="R inspired lib">
- causalml - Uplift modeling and causal inference with machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-multilearn - Multi-label classification for python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-expertsys - Highly interpretable classifiers for scikit learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- seqlearn - Sequence classification toolkit for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pystruct - Simple structured learning framework for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/spark_big.png" alt="Apache Spark based">
- RuleFit - Implementation of the rulefit. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pyGAM - Generalized Additive Models in Python.
- Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- metric-learn - Metric learning algorithms in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Ensemble Methods
- ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- stacked_generalization - Library for machine learning stacking generalization. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- vecstack - Python package for stacking (machine learning technique). <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Stacking - Simple and useful stacking library written in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Gradient Boosting
- CatBoost - An open-source gradient boosting on decision trees library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- XGBoost - Scalable, Portable, and Distributed Gradient Boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- ThunderGBM - Fast GBDTs and Random Forests on GPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- NGBoost - Natural Gradient Boosting for Probabilistic Prediction.
- LightGBM - A fast, distributed, high-performance gradient boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- TensorFlow Decision Forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras. <img height="20" src="img/keras_big.png" alt="keras"> <img height="20" src="img/tf_big2.png" alt="TensorFlow">
-
Imbalanced Datasets
- imbalanced-learn - Module to perform under-sampling and over-sampling with various techniques. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
-
Kernel Methods
- liquidSVM - An implementation of SVMs.
- ThunderSVM - A fast SVM Library on GPUs and CPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- pyFM - Factorization machines in python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- fastFM - A library for Factorization Machines. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tffm - TensorFlow implementation of an arbitrary order Factorization Machine. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
- scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Random Forests
- rpforest - A forest of random projection trees. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-random-bits-forest - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).<img height="20" src="img/sklearn_big.png" alt="sklearn">
- rgf_python - Python Wrapper of Regularized Greedy Forest. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Deep Learning
-
TensorFlow
- Keras - A high-level neural networks API running on top of TensorFlow. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- TensorFlow - Computation using data flow graphs for scalable machine learning by Google. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Elephas - Distributed Deep learning with Keras & Spark. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- qkeras - A quantization deep learning library. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Mesh TensorFlow - Model Parallelism Made Easier. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TFLearn - Deep learning library featuring a higher-level API for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Polyaxon - A platform that helps you build, manage and monitor deep learning models. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tfdeploy - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- keras-contrib - Keras community contributions. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Hyperas - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Sonnet - TensorFlow-based neural network library. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorpack - A Neural Net Training Interface on TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorflow-upstream - TensorFlow ROCm port. <img height="20" src="img/tf_big2.png" alt="sklearn"> <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
- Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TensorLight - A high-level framework for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
-
Others
- autograd - Efficiently computes derivatives of numpy code.
- Caffe - A fast open framework for deep learning.
- transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible"> <img height="20" src="img/tf_big2.png" alt="sklearn">
- nnabla - Neural Network Libraries by Sony.
- Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
-
PyTorch
- PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- ignite - High-level library to help with training neural networks in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Catalyst - High-level utils for PyTorch DL & RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- ChemicalX - A PyTorch-based deep learning library for drug pair scoring. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pytorch-lightning - PyTorch Lightning is just organized PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
JAX
-
-
Time Series
-
Others
- dateutil - Powerful extensions to the standard datetime module
- luminol - Anomaly Detection and Correlation library.
- Prophet - Automatic Forecasting Procedure.
- PyFlux - Open source time series library for Python.
- Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis
- darts - A python library for easy manipulation and forecasting of time series.
- greykite - A flexible, intuitive, and fast forecasting library next.
- statsforecast - Lightning fast forecasting with statistical and econometric models.
- mlforecast - Scalable machine learning-based time series forecasting.
- neuralforecast - Scalable machine learning-based time series forecasting.
- bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
- tick - Module for statistical learning, with a particular emphasis on time-dependent modeling. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- skforecast - Time series forecasting with machine learning models
- tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- maya - makes it very easy to parse a string and for changing timezones
- sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Probabilistic Graphical Models
-
Others
- pyAgrum - A GRaphical Universal Modeler.
- pomegranate - Probabilistic and graphical models for Python. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pgmpy - A python library for working with Probabilistic Graphical Models.
-
-
Probabilistic Methods
-
Others
- ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- PyMC - Bayesian Stochastic Modelling in Python.
- GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pyhsmm - Bayesian inference in HSMMs and HMMs.
- PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
- emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
- InferPy - Deep Probabilistic Modelling Made Easy. <img height="20" src="img/tf_big2.png" alt="sklearn">
- hsmmlearn - A library for hidden semi-Markov models with explicit durations.
- pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
-
Optimization
-
Others
- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
- hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
- nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
- Optuna - A hyperparameter optimization framework.
- sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-opt - Heuristic Algorithms for optimization.
- Talos - Hyperparameter Optimization for Keras Models.
- SMAC3 - Sequential Model-based Algorithm Configuration.
- hyperopt-sklearn - Hyper-parameter optimization for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-optimize - Sequential model-based optimization with a `scipy.optimize` interface.
- Spearmint - Bayesian optimization.
- pymoo - Multi-objective Optimization in Python.
- Optunity - Is a library containing various optimizers for hyperparameter tuning.
- PySwarms - A research toolkit for particle swarm optimization in Python.
- Solid - A comprehensive gradient-free optimization framework written in Python.
- Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
- sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- SafeOpt - Safe Bayesian Optimization.
- Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
- GPflowOpt - Bayesian Optimization using GPflow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- pycma - Python implementation of CMA-ES.
- BoTorch - Bayesian optimization in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
-
Visualization
-
Interactive plots
- plotly - A Python library that makes interactive and publication-quality graphs.
- Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
- Bokeh - Interactive Web Plotting for Python.
- animatplot - A python package for animating plots built on matplotlib.
- bqplot - Plotting library for IPython/Jupyter notebooks
- pyecharts - Migrated from [Echarts](https://github.com/apache/echarts), a charting and visualization library, to Python's interactive visual drawing library.<img height="20" src="img/pyecharts.png" alt="pyecharts"> <img height="20" src="img/echarts.png" alt="echarts">
-
Map
-
General Purposes
- Matplotlib - Plotting with Python.
- seaborn - Statistical data visualization using matplotlib.
- prettyplotlib - Painlessly create beautiful matplotlib plots.
- python-ternary - Ternary plotting library for Python with matplotlib.
- missingno - Missing data visualization module for Python.
- physt - Improved histograms.
-
Automatic Plotting
-
NLP
-
-
Deployment
-
NLP
- fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
- streamlit - Make it easy to deploy the machine learning model
- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
- binder - Enable sharing and execute Jupyter Notebooks
- gradio - Create UIs for your machine learning model in Python in 3 minutes.
- Vizro - A toolkit for creating modular data visualization applications.
- streamsync - No-code in the front, Python in the back. An open-source framework for creating data apps.
-
-
Data Manipulation
-
Data Frames
- pandas - Powerful Python data analysis toolkit.
- blaze - NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
- polars - A fast multi-threaded, hybrid-out-of-core DataFrame library.
- datatable - Data.table for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- modin - Speed up your pandas workflows by changing a single line of code. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- swifter - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
- cuDF - GPU DataFrame Library. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- xarray - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.
- xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by [The Alan Turing Institute](https://www.turing.ac.uk/).
- pandas-log - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
- pandasql - Allows you to query pandas DataFrames using SQL syntax. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
- Arctic - High-performance datastore for time series and tick data.
- pandas-gbq - pandas Google Big Query. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
Pipelines
- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
- sklearn-pandas - pandas integration with sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- Hamilton - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.
- dopanda - Hints and tips for using pandas in an analysis environment. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- meza - A Python toolkit for processing tabular data.
- pandas-ply - Functional data manipulation for pandas. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- Dplython - Dplyr for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- Prodmodel - Build system for data science pipelines.
- pdpipe - Sasy pipelines for pandas DataFrames.
- Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
-
Data-centric AI
-
Synthetic Data
- ydata-synthetic - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
-
Experimentation
-
Synthetic Data
- Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
- mlflow - Open source platform for the machine learning lifecycle.
- envd - 🏕️ machine learning development environment for data science and AI/ML engineering teams.
- Ax - Adaptive Experimentation Platform. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Sacred - A tool to help you configure, organize, log, and reproduce experiments.
- dvc - Data Version Control | Git for Data & Models | ML Experiments Management.
-
-
Model Explanation
-
Others
- themis-ml - A library that implements fairness-aware machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- FairML - FairML is a python toolbox auditing the machine learning models for bias. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- lucid - A collection of infrastructure and tools for research in neural network interpretability.
- model-analysis - Model analysis tools for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- dalex - moDel Agnostic Language for Exploration and explanation. <img height="20" src="img/sklearn_big.png" alt="sklearn"><img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
- Lime - Explaining the predictions of any machine learning classifier. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
- aequitas - Bias and Fairness Audit Toolkit.
- Alibi - Algorithms for monitoring and explaining machine learning models.
- PDPbox - Partial dependence plot toolbox.
- anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
- Contrastive Explanation - Contrastive Explanation (Foil Trees). <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
- L2X - Code for replicating the experiments in the paper *Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.
- treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
- Auralisation - Auralisation of learned features in CNN (for audio).
- CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
- Skater - Python Library for Model Interpretation.
- AI Explainability 360 - Interpretability and explainability of data and machine learning models.
- tensorboard-pytorch - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
- shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Computer Vision
-
Others
- scikit-image - Image Processing SciKit (Toolbox for SciPy).
- OpenCV - Open Source Computer Vision Library.
- LAVIS - A One-stop Library for Language-Vision Intelligence.
- torchvision - Datasets, Transforms, and Models specific to Computer Vision. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- PyTorch3D - PyTorch3D is FAIR's library of reusable components for deep learning with 3D data. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Decord - An efficient video loader for deep learning with smart shuffling that's super easy to digest.
- imgaug - Image augmentation for machine learning experiments.
- Augmentor - Image augmentation library in Python for machine learning.
- KerasCV - Industry-strength Computer Vision workflows with Keras. <img height="20" src="img/keras_big.png" alt="MXNet based">
- MMEngine - OpenMMLab Foundational Library for Training Deep Learning Models. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- imgaug_extension - Additional augmentations for imgaug.
- albumentations - Fast image augmentation library and easy-to-use wrapper around other libraries.
-
-
Computer Audition
-
Others
- librosa - Python library for audio and music analysis.
- Essentia - Library for audio and music analysis, description, and synthesis.
- madmom - Python audio and music signal processing library.
- aubio - A library for audio and music analysis.
- torchaudio - An audio library for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- muda - A library for augmenting annotated audio data.
- LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
- Yaafe - Audio features extraction.
- Marsyas - Music Analysis, Retrieval, and Synthesis for Audio Signals.
-
-
Learning-to-Rank & Recommender Systems
-
Others
- LightFM - A Python implementation of LightFM, a hybrid recommendation algorithm.
- Surprise - A Python scikit for building and analyzing recommender systems.
- RecBole - A unified, comprehensive and efficient recommendation library. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- TensorFlow Recommenders - A library for building recommender system models using TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow"> <img height="20" src="img/keras_big.png" alt="Keras compatible">
- TensorFlow Ranking - Learning to Rank in TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- allRank - allRank is a framework for training learning-to-rank neural models based on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Spotlight - Deep recommender models using PyTorch.
-
-
Automated Machine Learning
-
Others
- auto-sklearn - An AutoML toolkit and a drop-in replacement for a scikit-learn estimator. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- AutoKeras - AutoML library for deep learning. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- MLBox - A powerful Automated Machine Learning python library.
- AutoGluon - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
-
-
Reinforcement Learning
-
Others
- keras-rl - Deep Reinforcement Learning for Keras. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
- garage - A toolkit for reproducible reinforcement learning research.
- Stable Baselines3 - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
- Gymnasium - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly [Gym](https://github.com/openai/gym)).
- cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).
- TF-Agents - A library for Reinforcement Learning in TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- Acme - A library of reinforcement learning components and agents.
- rlpyt - Reinforcement Learning in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- DI-engine - OpenDILab Decision AI Engine. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- SKRL - Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Imitation - Clean PyTorch implementations of imitation and reward learning algorithms. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- PettingZoo - An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.
- d3rlpy - An offline deep reinforcement learning library.
- EnvPool - C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
- Catalyst-RL - PyTorch framework for RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- MAgent2 - An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.
- Shimmy - An API conversion tool for popular external reinforcement learning environments.
- Machin - A reinforcement library designed for pytorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Tianshou - An elegant PyTorch deep reinforcement learning library. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Horizon - A platform for Applied Reinforcement Learning.
-
-
Feature Engineering
-
General
- tsfresh - Automatic extraction of relevant features from time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Feature Engine - Feature engineering package with sklearn-like functionality. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- dirty_cat - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression). <img height="20" src="img/sklearn_big.png" alt="sklearn">
- NitroFE - Moving window features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Feature Forge - A set of tools for creating and testing machine learning features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sk-transformer - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- few - A feature engineering wrapper for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Featuretools - Automated feature engineering.
- scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- OpenFE - Automated feature generation with expert-level performance.
-
Feature Selection
- scikit-feature - Feature selection repository in Python.
- scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- zoofs - A feature selection library based on evolutionary algorithms.
- boruta_py - Implementations of the Boruta all-relevant feature selection method. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- BoostARoota - A fast xgboost feature selection algorithm. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Statistics
-
NLP
- statsmodels - Statistical modeling and econometrics in Python.
- Alphalens - Performance analysis of predictive (alpha) stock factors.
- scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
- stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
- weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
- Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
-
Conversion
-
Synthetic Data
- ONNX - Open Neural Network Exchange.
- sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.
- MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.
- treelite - Universal model exchange and serialization format for decision tree forests.
-
-
Data Validation
-
Synthetic Data
- evidently - Evaluate and monitor ML models from validation to production.
- great_expectations - Always know what to expect from your data.
- pandera - A lightweight, flexible, and expressive statistical data testing library.
- deepchecks - Validation & testing of ML models and data during model development, deployment, and production. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- TensorFlow Data Validation - Library for exploring and validating machine learning data.
- DataComPy - A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy.
-
-
Spatial Analysis
-
Distributed Computing
-
Synthetic Data
- PaddlePaddle - PArallel Distributed Deep LEarning.
- Distributed - Distributed computation in Python.
- dask-ml - Distributed and parallel machine learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Jubatus - Framework and Library for Distributed Online Machine Learning.
- Veles - Distributed machine learning platform.
- DMTK - Microsoft Distributed Machine Learning Toolkit.
- Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. <img height="20" src="img/tf_big2.png" alt="sklearn">
-
-
Graph Machine Learning
-
Others
- dgl - Python package built to ease deep learning on graph, on top of existing DL frameworks. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible"> <img height="20" src="img/tf_big2.png" alt="TensorFlow"> <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- StellarGraph - Machine Learning on Graphs. <img height="20" src="img/tf_big2.png" alt="TensorFlow"> <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Karate Club - An unsupervised machine learning library for graph-structured data.
- Spektral - Deep learning on graphs. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Little Ball of Fur - A library for sampling graph structured data.
- Graph Nets - Build Graph Nets in Tensorflow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- TensorFlow GNN - A library to build Graph Neural Networks on the TensorFlow platform. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- Auto Graph Learning - An autoML framework & toolkit for machine learning on graphs.
- PyTorch-BigGraph - Generate embeddings from large-scale graph-structured data. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Jraph - A Graph Neural Network Library in Jax.
- PyTorch Geometric Signed Directed - A signed/directed graph neural network extension library for PyTorch Geometric. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- GreatX - A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG). <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
-
Quantum Computing
-
Synthetic Data
- cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
- qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
- QML - A Python Toolkit for Quantum Machine Learning.
-
-
Genetic Programming
-
Others
- DEAP - Distributed Evolutionary Algorithms in Python.
- gplearn - Genetic Programming in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- PyGAD - Genetic Algorithm in Python. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible"> <img height="20" src="img/keras_big.png" alt="keras">
- monkeys - A strongly-typed genetic programming framework for Python.
- sklearn-genetic - Genetic feature selection module for scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- karoo_gp - A Genetic Programming platform for Python with GPU support. <img height="20" src="img/tf_big2.png" alt="sklearn">
-
-
Evaluation
-
Synthetic Data
- sklearn-evaluation - Model evaluation made easy: plots, tables, and markdown reports. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Metrics - Machine learning evaluation metric.
- recmetrics - Library of useful metrics and plots for evaluating recommender systems.
- AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.
-
Programming Languages
Categories
Machine Learning
43
Deep Learning
29
Data Manipulation
29
Model Explanation
25
Reinforcement Learning
23
Optimization
22
Visualization
18
Time Series
16
Feature Engineering
15
Graph Machine Learning
14
Probabilistic Methods
12
Computer Vision
12
Computations
10
Computer Audition
9
Natural Language Processing
9
Distributed Computing
7
Deployment
7
Learning-to-Rank & Recommender Systems
7
Genetic Programming
6
Data Validation
6
Statistics
6
Experimentation
6
Automated Machine Learning
5
Web Scraping
4
Evaluation
4
Conversion
4
Quantum Computing
3
Probabilistic Graphical Models
3
Spatial Analysis
2
License
1
Sub Categories
Others
168
Synthetic Data
48
General Purpose Machine Learning
21
TensorFlow
16
Data Frames
15
NLP
14
Pipelines
10
General
10
General Purposes
6
Gradient Boosting
6
Interactive plots
6
Kernel Methods
6
PyTorch
5
Ensemble Methods
5
Feature Selection
5
JAX
3
Data-centric AI
3
Automatic Plotting
3
Random Forests
3
Imbalanced Datasets
2
Map
2
Keywords
machine-learning
115
python
102
deep-learning
57
data-science
43
pytorch
31
tensorflow
26
pandas
18
scikit-learn
17
keras
15
reinforcement-learning
13
optimization
12
ml
12
visualization
11
time-series
11
ai
10
numpy
10
statistics
9
data-analysis
9
automl
9
hyperparameter-optimization
9
neural-network
9
c-plus-plus
8
artificial-intelligence
8
data-visualization
8
gpu
8
graph-neural-networks
7
forecasting
7
dask
7
computer-vision
7
xgboost
7
neural-networks
7
mlops
7
interpretability
7
cuda
6
nlp
6
machine-learning-algorithms
6
gym
6
automated-machine-learning
6
natural-language-processing
6
distributed
6
machinelearning
5
plotting
5
feature-selection
5
dataframe
5
regression
5
classification
5
lightgbm
5
pandas-dataframe
5
audio
5
genetic-algorithm
5