An open API service indexing awesome lists of open source software.

awesome-python-data-science

Probably the best curated list of data science software in Python.
https://github.com/krzjoa/awesome-python-data-science

Last synced: 12 days ago
JSON representation

  • Machine Learning

    • General Purpose Machine Learning

      • pystruct - Simple structured learning framework for Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • sklearn-expertsys - Highly interpretable classifiers for scikit learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • RuleFit - Implementation of the rulefit. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • pyGAM - Generalized Additive Models in Python.
      • causalml - Uplift modeling and causal inference with machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • SciPy - Fundamental algorithms for scientific computing in Python
      • metric-learn - Metric learning algorithms in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Gradient Boosting

      • XGBoost - Scalable, Portable, and Distributed Gradient Boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • LightGBM - A fast, distributed, high-performance gradient boosting. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • CatBoost - An open-source gradient boosting on decision trees library. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • ThunderGBM - Fast GBDTs and Random Forests on GPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
      • NGBoost - Natural Gradient Boosting for Probabilistic Prediction.
      • TensorFlow Decision Forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras. <img height="20" src="img/keras_big.png" alt="keras"> <img height="20" src="img/tf_big2.png" alt="TensorFlow">
    • Imbalanced Datasets

      • imbalanced-learn - Module to perform under-sampling and over-sampling with various techniques. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
    • Kernel Methods

      • pyFM - Factorization machines in python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • fastFM - A library for Factorization Machines. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • tffm - TensorFlow implementation of an arbitrary order Factorization Machine. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/tf_big2.png" alt="sklearn">
      • liquidSVM - An implementation of SVMs.
      • scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • ThunderSVM - A fast SVM Library on GPUs and CPUs. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
    • Random Forests

      • rpforest - A forest of random projection trees. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • sklearn-random-bits-forest - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).<img height="20" src="img/sklearn_big.png" alt="sklearn">
      • rgf_python - Python Wrapper of Regularized Greedy Forest. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Model Explanation

    • Others

      • dalex - moDel Agnostic Language for Exploration and explanation. <img height="20" src="img/sklearn_big.png" alt="sklearn"><img height="20" src="img/R_big.png" alt="R inspired/ported lib">
      • Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
      • Alibi - Algorithms for monitoring and explaining machine learning models.
      • anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
      • aequitas - Bias and Fairness Audit Toolkit.
      • Contrastive Explanation - Contrastive Explanation (Foil Trees). <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
      • Lime - Explaining the predictions of any machine learning classifier. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • FairML - FairML is a python toolbox auditing the machine learning models for bias. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • L2X - Code for replicating the experiments in the paper *Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.
      • PDPbox - Partial dependence plot toolbox.
      • PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
      • model-analysis - Model analysis tools for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • themis-ml - A library that implements fairness-aware machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Auralisation - Auralisation of learned features in CNN (for audio).
      • CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
      • lucid - A collection of infrastructure and tools for research in neural network interpretability.
      • Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
      • Skater - Python Library for Model Interpretation.
      • AI Explainability 360 - Interpretability and explainability of data and machine learning models.
      • tensorboard-pytorch - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
      • shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • InterpretML - InterpretML implements the Explainable Boosting Machine (EBM), a modern, fully interpretable machine learning model based on Generalized Additive Models (GAMs). This open-source package also provides visualization tools for EBMs, other glass-box models, and black-box explanations. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Natural Language Processing

    • Others

      • spaCy - Industrial-Strength Natural Language Processing.
      • gensim - Topic Modelling for Humans.
      • torchtext - Data loaders and abstractions for text and NLP. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
      • CLTK - The Classical Language Toolkik.
      • pyMorfologik - Python binding for <a href="https://github.com/morfologik/morfologik-stemming">Morfologik</a>.
      • skift - Scikit-learn wrappers for Python fastText. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Phonemizer - Simple text-to-phonemes converter for multiple languages.
      • flair - Very simple framework for state-of-the-art NLP.
  • Optimization

    • Others

      • OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
      • Optuna - A hyperparameter optimization framework.
      • pymoo - Multi-objective Optimization in Python.
      • pycma - Python implementation of CMA-ES.
      • Spearmint - Bayesian optimization.
      • scikit-opt - Heuristic Algorithms for optimization.
      • sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • SMAC3 - Sequential Model-based Algorithm Configuration.
      • Optunity - Is a library containing various optimizers for hyperparameter tuning.
      • hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
      • hyperopt-sklearn - Hyper-parameter optimization for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
      • SafeOpt - Safe Bayesian Optimization.
      • scikit-optimize - Sequential model-based optimization with a `scipy.optimize` interface.
      • Solid - A comprehensive gradient-free optimization framework written in Python.
      • PySwarms - A research toolkit for particle swarm optimization in Python.
      • Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
      • GPflowOpt - Bayesian Optimization using GPflow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Talos - Hyperparameter Optimization for Keras Models.
      • nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
      • BoTorch - Bayesian optimization in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
  • Probabilistic Graphical Models

    • Others

      • pyAgrum - A GRaphical Universal Modeler.
      • pomegranate - Probabilistic and graphical models for Python. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • pgmpy - A python library for working with Probabilistic Graphical Models.
  • Probabilistic Methods

    • Others

      • ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • PyMC - Bayesian Stochastic Modelling in Python.
      • InferPy - Deep Probabilistic Modelling Made Easy. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
      • sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
      • hsmmlearn - A library for hidden semi-Markov models with explicit durations.
      • pyhsmm - Bayesian inference in HSMMs and HMMs.
      • GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by [The Alan Turing Institute](https://www.turing.ac.uk/). <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Quantum Computing

    • Synthetic Data

      • qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
      • cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
      • QML - A Python Toolkit for Quantum Machine Learning.
  • Reinforcement Learning

    • Others

      • Gymnasium - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly [Gym](https://github.com/openai/gym)).
      • PettingZoo - An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.
      • MAgent2 - An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.
      • Stable Baselines3 - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
      • Shimmy - An API conversion tool for popular external reinforcement learning environments.
      • EnvPool - C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
      • Acme - A library of reinforcement learning components and agents.
      • Catalyst-RL - PyTorch framework for RL research. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • d3rlpy - An offline deep reinforcement learning library.
      • DI-engine - OpenDILab Decision AI Engine. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • TF-Agents - A library for Reinforcement Learning in TensorFlow. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
      • TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
      • TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
      • keras-rl - Deep Reinforcement Learning for Keras. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • garage - A toolkit for reproducible reinforcement learning research.
      • rlpyt - Reinforcement Learning in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • cleanrl - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).
      • Machin - A reinforcement library designed for pytorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • SKRL - Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • Imitation - Clean PyTorch implementations of imitation and reward learning algorithms. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • Tianshou - An elegant PyTorch deep reinforcement learning library. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • Horizon - A platform for Applied Reinforcement Learning.
      • RLlib - Scalable Reinforcement Learning.
  • Spatial Analysis

    • Synthetic Data

      • GeoPandas - Python tools for geographic data. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • PySal - Python Spatial Analysis Library.
  • Statistics

    • NLP

      • statsmodels - Statistical modeling and econometrics in Python.
      • stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
      • weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
      • scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
      • Alphalens - Performance analysis of predictive (alpha) stock factors.
      • Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
  • Time Series

    • Others

      • dateutil - Powerful extensions to the standard datetime module
      • skforecast - Time series forecasting with machine learning models
      • darts - A python library for easy manipulation and forecasting of time series.
      • statsforecast - Lightning fast forecasting with statistical and econometric models.
      • mlforecast - Scalable machine learning-based time series forecasting.
      • neuralforecast - Scalable machine learning-based time series forecasting.
      • tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • tick - Module for statistical learning, with a particular emphasis on time-dependent modeling. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • greykite - A flexible, intuitive, and fast forecasting library next.
      • Prophet - Automatic Forecasting Procedure.
      • PyFlux - Open source time series library for Python.
      • bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
      • luminol - Anomaly Detection and Correlation library.
      • maya - makes it very easy to parse a string and for changing timezones
      • Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis
      • sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Uncategorized

    • Uncategorized

      • TabGAN - Synthetic tabular data generation using GANs, Diffusion Models, and LLMs. <img height="16" width="16" src="https://github.com/krzjoa/awesome-python-data-science/raw/master/img/sklearn_big.png" alt="sklearn">
  • Visualization

    • Automatic Plotting

    • General Purposes

      • Matplotlib - Plotting with Python.
      • seaborn - Statistical data visualization using matplotlib.
      • prettyplotlib - Painlessly create beautiful matplotlib plots.
      • python-ternary - Ternary plotting library for Python with matplotlib.
      • missingno - Missing data visualization module for Python.
      • physt - Improved histograms.
    • Interactive plots

      • plotly - A Python library that makes interactive and publication-quality graphs.
      • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
      • animatplot - A python package for animating plots built on matplotlib.
      • Bokeh - Interactive Web Plotting for Python.
      • bqplot - Plotting library for IPython/Jupyter notebooks
      • pyecharts - Migrated from [Echarts](https://github.com/apache/echarts), a charting and visualization library, to Python's interactive visual drawing library.<img height="20" src="img/pyecharts.png" alt="pyecharts"> <img height="20" src="img/echarts.png" alt="echarts">
    • Map

      • folium - Makes it easy to visualize data on an interactive open street map
      • geemap - Python package for interactive mapping with Google Earth Engine (GEE)
    • NLP

  • Web Scraping