awesome-python-data-science
Probably the best curated list of data science software in Python.
https://github.com/krzjoa/awesome-python-data-science
Last synced: 2 days ago
JSON representation
-
Natural Language Processing
-
Others
- gensim - Topic Modelling for Humans.
- torchtext - Data loaders and abstractions for text and NLP. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
- CLTK - The Classical Language Toolkik.
- pyMorfologik - Python binding for <a href="https://github.com/morfologik/morfologik-stemming">Morfologik</a>.
- skift - Scikit-learn wrappers for Python fastText. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Phonemizer - Simple text-to-phonemes converter for multiple languages.
- flair - Very simple framework for state-of-the-art NLP.
- spaCy - Industrial-Strength Natural Language Processing.
-
-
Machine Learning
-
General Purpose Machine Learning
- scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- PyCaret - An open-source, low-code machine learning library in Python. <img height="20" src="img/R_big.png" alt="R inspired lib">
- Shogun - Machine learning toolbox.
- xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
- cuML - RAPIDS Machine Learning Library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/spark_big.png" alt="Apache Spark based">
- mlpack - A scalable C++ machine learning library (Python bindings).
- dlib - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
- MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-multilearn - Multi-label classification for python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- seqlearn - Sequence classification toolkit for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pystruct - Simple structured learning framework for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-expertsys - Highly interpretable classifiers for scikit learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- RuleFit - Implementation of the rulefit. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pyGAM - Generalized Additive Models in Python.
- causalml - Uplift modeling and causal inference with machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Ensemble Methods
- ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Stacking - Simple and useful stacking library written in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- stacked_generalization - Library for machine learning stacking generalization. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- vecstack - Python package for stacking (machine learning technique). <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Gradient Boosting
- XGBoost - Scalable, Portable, and Distributed Gradient Boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- LightGBM - A fast, distributed, high-performance gradient boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- CatBoost - An open-source gradient boosting on decision trees library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- ThunderGBM - Fast GBDTs and Random Forests on GPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- NGBoost - Natural Gradient Boosting for Probabilistic Prediction.
- TensorFlow Decision Forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras. <img height="20" src="img/keras_big.png" alt="keras"> <img height="20" src="img/tf_big2.png" alt="TensorFlow">
-
Imbalanced Datasets
- imbalanced-learn - Module to perform under-sampling and over-sampling with various techniques. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
-
Random Forests
- rpforest - A forest of random projection trees. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-random-bits-forest - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).<img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Kernel Methods
- pyFM - Factorization machines in python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- fastFM - A library for Factorization Machines. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tffm - TensorFlow implementation of an arbitrary order Factorization Machine. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
- liquidSVM - An implementation of SVMs.
- scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ThunderSVM - A fast SVM Library on GPUs and CPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
-
-
Deep Learning
-
TensorFlow
- Keras - A high-level neural networks API running on top of TensorFlow. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- TensorFlow - Computation using data flow graphs for scalable machine learning by Google. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TFLearn - Deep learning library featuring a higher-level API for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Sonnet - TensorFlow-based neural network library. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorpack - A Neural Net Training Interface on TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Polyaxon - A platform that helps you build, manage and monitor deep learning models. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tfdeploy - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorflow-upstream - TensorFlow ROCm port. <img height="20" src="img/tf_big2.png" alt="sklearn"> <img height="20" src="img/amd_big.png" alt="Possible to run on AMD GPU">
- TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Mesh TensorFlow - Model Parallelism Made Easier. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code. <img height="20" src="img/tf_big2.png" alt="sklearn">
- keras-contrib - Keras community contributions. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Hyperas - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Elephas - Distributed Deep learning with Keras & Spark. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- qkeras - A quantization deep learning library. <img height="20" src="img/keras_big.png" alt="Keras compatible">
-
PyTorch
- PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pytorch-lightning - PyTorch Lightning is just organized PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- ignite - High-level library to help with training neural networks in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Catalyst - High-level utils for PyTorch DL & RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- ChemicalX - A PyTorch-based deep learning library for drug pair scoring. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
JAX
-
Others
- transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible"> <img height="20" src="img/tf_big2.png" alt="sklearn">
- Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
- autograd - Efficiently computes derivatives of numpy code.
- Caffe - A fast open framework for deep learning.
- nnabla - Neural Network Libraries by Sony.
-
-
Probabilistic Graphical Models
-
Others
- pyAgrum - A GRaphical Universal Modeler.
- pomegranate - Probabilistic and graphical models for Python. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- pgmpy - A python library for working with Probabilistic Graphical Models.
-
-
Probabilistic Methods
-
Others
- ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- PyMC - Bayesian Stochastic Modelling in Python.
- InferPy - Deep Probabilistic Modelling Made Easy. <img height="20" src="img/tf_big2.png" alt="sklearn">
- PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
- sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
- hsmmlearn - A library for hidden semi-Markov models with explicit durations.
- pyhsmm - Bayesian inference in HSMMs and HMMs.
- GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Visualization
-
Interactive plots
- plotly - A Python library that makes interactive and publication-quality graphs.
- animatplot - A python package for animating plots built on matplotlib.
- Bokeh - Interactive Web Plotting for Python.
- bqplot - Plotting library for IPython/Jupyter notebooks
- pyecharts - Migrated from [Echarts](https://github.com/apache/echarts), a charting and visualization library, to Python's interactive visual drawing library.<img height="20" src="img/pyecharts.png" alt="pyecharts"> <img height="20" src="img/echarts.png" alt="echarts">
- Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
-
General Purposes
- Matplotlib - Plotting with Python.
- seaborn - Statistical data visualization using matplotlib.
- prettyplotlib - Painlessly create beautiful matplotlib plots.
- python-ternary - Ternary plotting library for Python with matplotlib.
- missingno - Missing data visualization module for Python.
- physt - Improved histograms.
-
Automatic Plotting
-
NLP
-
Map
- folium - Makes it easy to visualize data on an interactive open street map
-
-
Deployment
-
NLP
- streamlit - Make it easy to deploy the machine learning model
- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
- gradio - Create UIs for your machine learning model in Python in 3 minutes.
- Vizro - A toolkit for creating modular data visualization applications.
- fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
- binder - Enable sharing and execute Jupyter Notebooks
-
-
Data Manipulation
-
Data Frames
- pandas - Powerful Python data analysis toolkit.
- polars - A fast multi-threaded, hybrid-out-of-core DataFrame library.
- Arctic - High-performance datastore for time series and tick data.
- datatable - Data.table for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- cuDF - GPU DataFrame Library. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- blaze - NumPy and pandas interface to Big Data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- pandasql - Allows you to query pandas DataFrames using SQL syntax. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by [The Alan Turing Institute](https://www.turing.ac.uk/).
- pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
- modin - Speed up your pandas workflows by changing a single line of code. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- swifter - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
- pandas-log - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
- vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
- xarray - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.
-
Pipelines
- pandas-ply - Functional data manipulation for pandas. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- Dplython - Dplyr for Python. <img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- sklearn-pandas - pandas integration with sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- meza - A Python toolkit for processing tabular data.
- Prodmodel - Build system for data science pipelines.
- dopanda - Hints and tips for using pandas in an analysis environment. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- Hamilton - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.
- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
-
Data-centric AI
-
Synthetic Data
- ydata-synthetic - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
-
Experimentation
-
Synthetic Data
- Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
- mlflow - Open source platform for the machine learning lifecycle.
- dvc - Data Version Control | Git for Data & Models | ML Experiments Management.
- envd - 🏕️ machine learning development environment for data science and AI/ML engineering teams.
- Sacred - A tool to help you configure, organize, log, and reproduce experiments.
- Ax - Adaptive Experimentation Platform. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Computations
-
Synthetic Data
- numpy - The fundamental package needed for scientific computing with Python.
- numpy - The fundamental package needed for scientific computing with Python.
- Dask - Parallel computing with task scheduling. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- CuPy - NumPy-like API accelerated with CUDA.
- scikit-tensor - Python library for multilinear algebra and tensor factorizations.
- numdifftools - Solve automatic numerical differentiation problems in one or more variables.
- quaternion - Add built-in support for quaternions to numpy.
- adaptive - Tools for adaptive and parallel samping of mathematical functions.
- NumExpr - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.
-
-
Automated Machine Learning
-
Others
- auto-sklearn - An AutoML toolkit and a drop-in replacement for a scikit-learn estimator. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- AutoKeras - AutoML library for deep learning. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- AutoGluon - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
- MLBox - A powerful Automated Machine Learning python library.
-
-
Computer Audition
-
Others
- torchaudio - An audio library for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- librosa - Python library for audio and music analysis.
- Yaafe - Audio features extraction.
- aubio - A library for audio and music analysis.
- Essentia - Library for audio and music analysis, description, and synthesis.
- LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
- Marsyas - Music Analysis, Retrieval, and Synthesis for Audio Signals.
- muda - A library for augmenting annotated audio data.
- madmom - Python audio and music signal processing library.
-
-
Computer Vision
-
Others
- torchvision - Datasets, Transforms, and Models specific to Computer Vision. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- PyTorch3D - PyTorch3D is FAIR's library of reusable components for deep learning with 3D data. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- KerasCV - Industry-strength Computer Vision workflows with Keras. <img height="20" src="img/keras_big.png" alt="MXNet based">
- OpenCV - Open Source Computer Vision Library.
- Decord - An efficient video loader for deep learning with smart shuffling that's super easy to digest.
- MMEngine - OpenMMLab Foundational Library for Training Deep Learning Models. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- scikit-image - Image Processing SciKit (Toolbox for SciPy).
- imgaug - Image augmentation for machine learning experiments.
- Augmentor - Image augmentation library in Python for machine learning.
- LAVIS - A One-stop Library for Language-Vision Intelligence.
-
-
Time Series
-
Others
- sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- skforecast - Time series forecasting with machine learning models
- darts - A python library for easy manipulation and forecasting of time series.
- statsforecast - Lightning fast forecasting with statistical and econometric models.
- mlforecast - Scalable machine learning-based time series forecasting.
- neuralforecast - Scalable machine learning-based time series forecasting.
- tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tick - Module for statistical learning, with a particular emphasis on time-dependent modeling. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- greykite - A flexible, intuitive, and fast forecasting library next.
- Prophet - Automatic Forecasting Procedure.
- PyFlux - Open source time series library for Python.
- bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
- luminol - Anomaly Detection and Correlation library.
- maya - makes it very easy to parse a string and for changing timezones
- Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis
- dateutil - Powerful extensions to the standard datetime module
-
-
Reinforcement Learning
-
Others
- Gymnasium - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly [Gym](https://github.com/openai/gym)).
- PettingZoo - An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.
- MAgent2 - An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.
- Stable Baselines3 - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
- Shimmy - An API conversion tool for popular external reinforcement learning environments.
- EnvPool - C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
- Acme - A library of reinforcement learning components and agents.
- Catalyst-RL - PyTorch framework for RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- d3rlpy - An offline deep reinforcement learning library.
- DI-engine - OpenDILab Decision AI Engine. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- TF-Agents - A library for Reinforcement Learning in TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
- keras-rl - Deep Reinforcement Learning for Keras. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- garage - A toolkit for reproducible reinforcement learning research.
- rlpyt - Reinforcement Learning in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).
- Machin - A reinforcement library designed for pytorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- SKRL - Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Imitation - Clean PyTorch implementations of imitation and reward learning algorithms. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
-
Graph Machine Learning
-
Others
- pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- PyTorch Geometric Signed Directed - A signed/directed graph neural network extension library for PyTorch Geometric. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- dgl - Python package built to ease deep learning on graph, on top of existing DL frameworks. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible"> <img height="20" src="img/tf_big2.png" alt="TensorFlow"> <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- Spektral - Deep learning on graphs. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- StellarGraph - Machine Learning on Graphs. <img height="20" src="img/tf_big2.png" alt="TensorFlow"> <img height="20" src="img/keras_big.png" alt="Keras compatible">
- Graph Nets - Build Graph Nets in Tensorflow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- TensorFlow GNN - A library to build Graph Neural Networks on the TensorFlow platform. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- Auto Graph Learning - An autoML framework & toolkit for machine learning on graphs.
- PyTorch-BigGraph - Generate embeddings from large-scale graph-structured data. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Karate Club - An unsupervised machine learning library for graph-structured data.
- Little Ball of Fur - A library for sampling graph structured data.
- GreatX - A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG). <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- Jraph - A Graph Neural Network Library in Jax.
-
-
Learning-to-Rank & Recommender Systems
-
Others
- LightFM - A Python implementation of LightFM, a hybrid recommendation algorithm.
- Spotlight - Deep recommender models using PyTorch.
- Surprise - A Python scikit for building and analyzing recommender systems.
- RecBole - A unified, comprehensive and efficient recommendation library. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- allRank - allRank is a framework for training learning-to-rank neural models based on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- TensorFlow Recommenders - A library for building recommender system models using TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow"> <img height="20" src="img/keras_big.png" alt="Keras compatible">
- TensorFlow Ranking - Learning to Rank in TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
-
-
Model Explanation
-
Others
- dalex - moDel Agnostic Language for Exploration and explanation. <img height="20" src="img/sklearn_big.png" alt="sklearn"><img height="20" src="img/R_big.png" alt="R inspired/ported lib">
- Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
- Alibi - Algorithms for monitoring and explaining machine learning models.
- anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
- aequitas - Bias and Fairness Audit Toolkit.
- Contrastive Explanation - Contrastive Explanation (Foil Trees). <img height="20" src="img/sklearn_big.png" alt="sklearn">
- yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
- Lime - Explaining the predictions of any machine learning classifier. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- FairML - FairML is a python toolbox auditing the machine learning models for bias. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- L2X - Code for replicating the experiments in the paper *Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.
- PDPbox - Partial dependence plot toolbox.
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
- model-analysis - Model analysis tools for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- themis-ml - A library that implements fairness-aware machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Auralisation - Auralisation of learned features in CNN (for audio).
- CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
- lucid - A collection of infrastructure and tools for research in neural network interpretability.
- Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
-
-
Genetic Programming
-
Others
- gplearn - Genetic Programming in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- PyGAD - Genetic Algorithm in Python. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible"> <img height="20" src="img/keras_big.png" alt="keras">
- DEAP - Distributed Evolutionary Algorithms in Python.
- karoo_gp - A Genetic Programming platform for Python with GPU support. <img height="20" src="img/tf_big2.png" alt="sklearn">
- monkeys - A strongly-typed genetic programming framework for Python.
- sklearn-genetic - Genetic feature selection module for scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Optimization
-
Others
- Optuna - A hyperparameter optimization framework.
- pymoo - Multi-objective Optimization in Python.
- pycma - Python implementation of CMA-ES.
- Spearmint - Bayesian optimization.
- BoTorch - Bayesian optimization in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- scikit-opt - Heuristic Algorithms for optimization.
- sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- SMAC3 - Sequential Model-based Algorithm Configuration.
- Optunity - Is a library containing various optimizers for hyperparameter tuning.
- hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
- hyperopt-sklearn - Hyper-parameter optimization for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
- SafeOpt - Safe Bayesian Optimization.
- scikit-optimize - Sequential model-based optimization with a `scipy.optimize` interface.
- Solid - A comprehensive gradient-free optimization framework written in Python.
- PySwarms - A research toolkit for particle swarm optimization in Python.
- Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
- GPflowOpt - Bayesian Optimization using GPflow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Talos - Hyperparameter Optimization for Keras Models.
- nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
-
-
Feature Engineering
-
General
- Featuretools - Automated feature engineering.
- Feature Engine - Feature engineering package with sklearn-like functionality. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- OpenFE - Automated feature generation with expert-level performance.
- Feature Forge - A set of tools for creating and testing machine learning features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- few - A feature engineering wrapper for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tsfresh - Automatic extraction of relevant features from time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- dirty_cat - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression). <img height="20" src="img/sklearn_big.png" alt="sklearn">
- NitroFE - Moving window features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sk-transformer - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
Feature Selection
- scikit-feature - Feature selection repository in Python.
- boruta_py - Implementations of the Boruta all-relevant feature selection method. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- BoostARoota - A fast xgboost feature selection algorithm. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- zoofs - A feature selection library based on evolutionary algorithms.
-
-
Statistics
-
NLP
- Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- statsmodels - Statistical modeling and econometrics in Python.
- stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
- weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
- scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
- Alphalens - Performance analysis of predictive (alpha) stock factors.
-
-
Distributed Computing
-
Synthetic Data
- Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Veles - Distributed machine learning platform.
- Jubatus - Framework and Library for Distributed Online Machine Learning.
- DMTK - Microsoft Distributed Machine Learning Toolkit.
- PaddlePaddle - PArallel Distributed Deep LEarning.
- dask-ml - Distributed and parallel machine learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Distributed - Distributed computation in Python.
-
-
Data Validation
-
Synthetic Data
- great_expectations - Always know what to expect from your data.
- pandera - A lightweight, flexible, and expressive statistical data testing library.
- deepchecks - Validation & testing of ML models and data during model development, deployment, and production. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- evidently - Evaluate and monitor ML models from validation to production.
- TensorFlow Data Validation - Library for exploring and validating machine learning data.
- DataComPy - A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy.
-
-
Evaluation
-
Synthetic Data
- recmetrics - Library of useful metrics and plots for evaluating recommender systems.
- Metrics - Machine learning evaluation metric.
- sklearn-evaluation - Model evaluation made easy: plots, tables, and markdown reports. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.
-
-
Web Scraping
-
Synthetic Data
- Pattern - establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
- twitterscraper
- BeautifulSoup
- Selenium
-
-
Spatial Analysis
-
Quantum Computing
-
Synthetic Data
- qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
- cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
- QML - A Python Toolkit for Quantum Machine Learning.
-
-
Conversion
-
Synthetic Data
- sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.
- ONNX - Open Neural Network Exchange.
- MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.
- treelite - Universal model exchange and serialization format for decision tree forests.
-
Programming Languages
Categories
Machine Learning
40
Deep Learning
28
Data Manipulation
26
Model Explanation
22
Optimization
22
Reinforcement Learning
21
Visualization
17
Time Series
16
Feature Engineering
15
Graph Machine Learning
13
Probabilistic Methods
11
Computer Vision
10
Computer Audition
9
Computations
9
Natural Language Processing
9
Distributed Computing
7
Learning-to-Rank & Recommender Systems
7
Genetic Programming
6
Data Validation
6
Deployment
6
Statistics
6
Experimentation
6
Automated Machine Learning
5
Web Scraping
4
Evaluation
4
Conversion
4
Quantum Computing
3
Probabilistic Graphical Models
3
Spatial Analysis
2
License
1
Sub Categories
Others
159
Synthetic Data
47
General Purpose Machine Learning
19
TensorFlow
15
Data Frames
14
NLP
13
General
10
Pipelines
8
General Purposes
6
Gradient Boosting
6
Interactive plots
6
Kernel Methods
6
PyTorch
5
Ensemble Methods
5
Feature Selection
5
JAX
3
Data-centric AI
3
Automatic Plotting
3
Imbalanced Datasets
2
Random Forests
2
Map
1
Keywords
machine-learning
118
python
102
deep-learning
59
data-science
45
pytorch
32
tensorflow
27
pandas
18
scikit-learn
17
keras
15
ml
13
reinforcement-learning
13
optimization
12
time-series
11
visualization
11
ai
11
numpy
10
neural-network
10
data-analysis
9
hyperparameter-optimization
9
automl
9
statistics
9
gpu
8
artificial-intelligence
8
data-visualization
8
c-plus-plus
8
computer-vision
8
interpretability
8
forecasting
7
xgboost
7
mlops
7
graph-neural-networks
7
natural-language-processing
7
dask
7
neural-networks
7
distributed
6
machinelearning
6
nlp
6
cuda
6
machine-learning-algorithms
6
gym
6
automated-machine-learning
6
genetic-algorithm
5
anomaly-detection
5
classification
5
gradient-boosting
5
feature-selection
5
pandas-dataframe
5
plotting
5
lightgbm
5
regression
5