awesome-python-data-science
From gitlab
https://github.com/jacob98415/awesome-python-data-science
Last synced: 7 days ago
JSON representation
-
Model Explanation
-
NLP
- ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
- Lime - Explaining the predictions of any machine learning classifier. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- FairML - FairML is a python toolbox auditing the machine learning models for bias. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- L2X - Code for replicating the experiments in the paper *Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.
- PDPbox - Partial dependence plot toolbox.
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
- model-analysis - Model analysis tools for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- themis-ml - A library that implements fairness-aware machine learning algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Auralisation - Auralisation of learned features in CNN (for audio).
- CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
- lucid - A collection of infrastructure and tools for research in neural network interpretability.
- Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
- mxboard - Logging MXNet data for visualization in TensorBoard. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- Skater - Python Library for Model Interpretation.
- AI Explainability 360 - Interpretability and explainability of data and machine learning models.
- tensorboard-pytorch - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
- shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Natural Language Processing
-
NLP
- spaCy - Industrial-Strength Natural Language Processing.
- gensim - Topic Modelling for Humans.
- NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
- CLTK - The Classical Language Toolkik.
- pyMorfologik - Python binding for <a href="https://github.com/morfologik/morfologik-stemming">Morfologik</a>.
- skift - Scikit-learn wrappers for Python fastText. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Phonemizer - Simple text-to-phonemes converter for multiple languages.
- flair - Very simple framework for state-of-the-art NLP.
-
-
Optimization
-
NLP
- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
- Optuna - A hyperparameter optimization framework.
- Spearmint - Bayesian optimization.
- scikit-opt - Heuristic Algorithms for optimization.
- sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- SMAC3 - Sequential Model-based Algorithm Configuration.
- Optunity - Is a library containing various optimizers for hyperparameter tuning.
- hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
- hyperopt-sklearn - Hyper-parameter optimization for sklearn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
- SafeOpt - Safe Bayesian Optimization.
- scikit-optimize - Sequential model-based optimization with a `scipy.optimize` interface.
- Solid - A comprehensive gradient-free optimization framework written in Python.
- PySwarms - A research toolkit for particle swarm optimization in Python.
- Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
- GPflowOpt - Bayesian Optimization using GPflow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Talos - Hyperparameter Optimization for Keras Models.
- nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
- BoTorch - Bayesian optimization in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
-
Probabilistic Methods
-
NLP
- pomegranate - Probabilistic and graphical models for Python. <img height="20" src="img/gpu_big.png" alt="GPU accelerated">
- PyMC - Bayesian Stochastic Modelling in Python.
- InferPy - Deep Probabilistic Modelling Made Easy. <img height="20" src="img/tf_big2.png" alt="sklearn">
- PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
- sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pgmpy - A python library for working with Probabilistic Graphical Models.
- PtStat - Probabilistic Programming and Statistical Inference in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
- hsmmlearn - A library for hidden semi-Markov models with explicit durations.
- pyhsmm - Bayesian inference in HSMMs and HMMs.
- GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- MXFusion - Modular Probabilistic Programming on MXNet. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by [The Alan Turing Institute](https://www.turing.ac.uk/). <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Quantum Computing
-
NLP
- qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
- cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
- QML - A Python Toolkit for Quantum Machine Learning.
-
-
Reinforcement Learning
-
NLP
- OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
- garage - A toolkit for reproducible reinforcement learning research.
- OpenAI Baselines - High-quality implementations of reinforcement learning algorithms.
- Stable Baselines - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
- TF-Agents - A library for Reinforcement Learning in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
- keras-rl - Deep Reinforcement Learning for Keras. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- ChainerRL - A deep reinforcement learning library built on top of Chainer.
- Coach - Easy experimentation with state-of-the-art Reinforcement Learning algorithms.
- Horizon - A platform for Applied Reinforcement Learning.
- RLlib - Scalable Reinforcement Learning.
-
-
Spatial Analysis
-
Statistics
-
NLP
- statsmodels - Statistical modeling and econometrics in Python.
- stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
- weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
- scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
- Alphalens - Performance analysis of predictive (alpha) stock factors.
-
-
Time Series
-
NLP
- dateutil - Powerful extensions to the standard datetime module
- darts - A python library for easy manipulation and forecasting of time series.
- statsforecast - Lightning fast forecasting with statistical and econometric models.
- mlforecast - Scalable machine learning-based time series forecasting.
- neuralforecast - Scalable machine learning-based time series forecasting.
- tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tick - Module for statistical learning, with a particular emphasis on time-dependent modeling. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- greykite - A flexible, intuitive, and fast forecasting library next.
- Prophet - Automatic Forecasting Procedure.
- PyFlux - Open source time series library for Python.
- bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
- luminol - Anomaly Detection and Correlation library.
- maya - makes it very easy to parse a string and for changing timezones
- Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis
- sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Visualization
-
Automatic Plotting
-
General Purposes
- Matplotlib - Plotting with Python.
- seaborn - Statistical data visualization using matplotlib.
- prettyplotlib - Painlessly create beautiful matplotlib plots.
- python-ternary - Ternary plotting library for Python with matplotlib.
- missingno - Missing data visualization module for Python.
- physt - Improved histograms.
-
Interactive plots
- Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
- animatplot - A python package for animating plots built on matplotlib.
- Bokeh - Interactive Web Plotting for Python.
- bqplot - Plotting library for IPython/Jupyter notebooks
- pyecharts - Migrated from [Echarts](https://github.com/apache/echarts), a charting and visualization library, to Python's interactive visual drawing library.<img height="20" src="img/pyecharts.png" alt="pyecharts"> <img height="20" src="img/echarts.png" alt="echarts">
-
Map
-
NLP
-
-
Web Scraping
-
Others
- BeautifulSoup
- Selenium
- Pattern - establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
- twitterscraper
-
Programming Languages
Categories
Machine Learning
47
Deep Learning
43
Data Manipulation
33
Model Explanation
26
Optimization
20
Visualization
17
Probabilistic Methods
16
Time Series
15
Feature Engineering
13
Reinforcement Learning
13
Computer Audition
8
Computations
8
Natural Language Processing
8
Distributed Computing
7
Experimentation
6
Computer Vision
6
Statistics
5
Data Validation
5
Genetic Programming
5
Web Scraping
4
Evaluation
4
Deployment
3
Quantum Computing
3
Conversion
3
Spatial Analysis
2
License
1
Sub Categories
NLP
165
General Purpose Machine Learning
20
TensorFlow
19
Data Frames
19
Others
12
Pipelines
10
PyTorch
9
General
8
MXNet
7
General Purposes
6
Kernel Methods
6
Ensemble Methods
5
Feature Selection
5
Interactive plots
5
Gradient Boosting
4
Automated Machine Learning
4
Data-centric AI
3
Automatic Plotting
3
Extreme Learning Machine
3
Random Forests
3
Imbalanced Datasets
2
Map
2
Synthetic Data
1
Keywords
machine-learning
99
python
88
deep-learning
44
data-science
40
tensorflow
23
pytorch
20
pandas
18
scikit-learn
17
neural-network
13
keras
13
visualization
11
time-series
10
ml
10
statistics
10
numpy
10
hyperparameter-optimization
9
optimization
9
ai
8
mxnet
8
computer-vision
8
data-analysis
8
forecasting
7
nlp
7
xgboost
7
reinforcement-learning
7
data-visualization
7
gpu
7
c-plus-plus
7
dask
6
distributed
6
cuda
6
mlops
6
natural-language-processing
6
neural-networks
6
automl
6
dataframe
6
pandas-dataframe
5
lightgbm
5
machinelearning
5
artificial-intelligence
5
plotting
5
automated-machine-learning
5
feature-selection
5
machine-learning-algorithms
5
interpretability
5
jupyter
4
particle-swarm-optimization
4
pydata
4
gbdt
4
bayesian-optimization
4