Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-python-data-science
Probably the best curated list of data science software in Python.
https://github.com/krzjoa/awesome-python-data-science
Last synced: 5 days ago
JSON representation
-
Natural Language Processing
-
Machine Learning
-
General Purpose Machine Learning
- scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- modAL - Modular active learning framework for Python3. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- metric-learn - Metric learning algorithms in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Ensemble Methods
- ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
Random Forests
- rgf_python - Python Wrapper of Regularized Greedy Forest. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Deep Learning
-
TensorFlow
- Keras - A high-level neural networks API running on top of TensorFlow. <img height="20" src="img/keras_big.png" alt="Keras compatible">
- TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Sonnet - TensorFlow-based neural network library. <img height="20" src="img/tf_big2.png" alt="sklearn">
- tensorpack - A Neural Net Training Interface on TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- TensorLight - A high-level framework for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Keras - A high-level neural networks API running on top of TensorFlow. <img height="20" src="img/keras_big.png" alt="Keras compatible">
-
MXNet
- MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
- MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
-
PyTorch
- pytorch-lightning - PyTorch Lightning is just organized PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- skorch - A scikit-learn compatible neural network library that wraps PyTorch. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
-
Computer Vision
-
Others
- imgaug_extension - Additional augmentations for imgaug.
- albumentations - Fast image augmentation library and easy-to-use wrapper around other libraries.
-
-
Time Series
-
Others
- dateutil - Powerful extensions to the standard datetime module
- sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Reinforcement Learning
-
Others
- RLlib - Scalable Reinforcement Learning.
- TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
- TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- Horizon - A platform for Applied Reinforcement Learning.
-
-
Probabilistic Graphical Models
-
Others
- pyAgrum - A GRaphical Universal Modeler.
-
-
Probabilistic Methods
-
Others
- ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- GPflow - Gaussian processes in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
- pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
- ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
- skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by [The Alan Turing Institute](https://www.turing.ac.uk/). <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Model Explanation
-
Others
- Skater - Python Library for Model Interpretation.
- FlashLight - Visualization Tool for your NeuralNetwork.
- shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- AI Explainability 360 - Interpretability and explainability of data and machine learning models.
- tensorboard-pytorch - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
-
-
Optimization
-
Others
- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
- sigopt_sklearn - SigOpt wrappers for scikit-learn methods. <img height="20" src="img/sklearn_big.png" alt="sklearn">
- Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
- POT - Python Optimal Transport library.
-
-
Visualization
-
Interactive plots
-
Map
-
Automatic Plotting
- HoloViews - Stop plotting your data - annotate your data and let it visualize itself.
-
-
Deployment
-
NLP
- fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
- streamlit - Make it easy to deploy the machine learning model
- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
- binder - Enable sharing and execute Jupyter Notebooks
- streamlit - Make it easy to deploy the machine learning model
- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
-
-
Data Manipulation
-
Data Frames
- pandas - Powerful Python data analysis toolkit.
- Arctic - High-performance datastore for time series and tick data.
- pandas-gbq - pandas Google Big Query. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
- pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
-
Pipelines
- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
- pdpipe - Sasy pipelines for pandas DataFrames.
- Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
- pyjanitor - Clean APIs for data cleaning. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
-
Distributed Computing
-
Experimentation
-
Computations
-
Synthetic Data
- numpy - The fundamental package needed for scientific computing with Python.
- numpy - The fundamental package needed for scientific computing with Python.
- bottleneck - Fast NumPy array functions written in C.
- numpy - The fundamental package needed for scientific computing with Python.
-
-
Web Scraping
-
Synthetic Data
-
-
Automated Machine Learning
-
Graph Machine Learning
-
Others
- pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
-
-
Feature Engineering
-
General
- Featuretools - Automated feature engineering.
- skl-groups - A scikit-learn addon to operate on set/"group"-based features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
-
-
Statistics
-
NLP
- pandas_summary - Extension to pandas dataframes describe function. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
-
-
Evaluation
-
Synthetic Data
- AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.
-
-
Quantum Computing
-
Synthetic Data
- PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
-
Categories
Deep Learning
11
Data Manipulation
8
Machine Learning
7
Visualization
6
Deployment
6
Model Explanation
5
Probabilistic Methods
5
Optimization
4
Reinforcement Learning
4
Computations
4
Time Series
3
Natural Language Processing
3
Web Scraping
3
Experimentation
2
Distributed Computing
2
Computer Vision
2
Feature Engineering
2
Automated Machine Learning
2
License
1
Graph Machine Learning
1
Probabilistic Graphical Models
1
Quantum Computing
1
Statistics
1
Evaluation
1