Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-python-data-science

Probably the best curated list of data science software in Python.
https://github.com/krzjoa/awesome-python-data-science

Last synced: 5 days ago
JSON representation

  • Natural Language Processing

    • Others

      • gensim - Topic Modelling for Humans.
      • spaCy - Industrial-Strength Natural Language Processing.
      • flair - Very simple framework for state-of-the-art NLP.
  • Machine Learning

    • General Purpose Machine Learning

      • scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • modAL - Modular active learning framework for Python3. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • metric-learn - Metric learning algorithms in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Ensemble Methods

      • ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Random Forests

      • rgf_python - Python Wrapper of Regularized Greedy Forest. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Deep Learning

    • TensorFlow

      • Keras - A high-level neural networks API running on top of TensorFlow. <img height="20" src="img/keras_big.png" alt="Keras compatible">
      • TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Sonnet - TensorFlow-based neural network library. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • tensorpack - A Neural Net Training Interface on TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • TensorLight - A high-level framework for TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Keras - A high-level neural networks API running on top of TensorFlow. <img height="20" src="img/keras_big.png" alt="Keras compatible">
    • MXNet

      • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
      • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. <img height="20" src="img/mxnet_big.png" alt="MXNet based">
    • PyTorch

      • pytorch-lightning - PyTorch Lightning is just organized PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • skorch - A scikit-learn compatible neural network library that wraps PyTorch. <img height="20" src="img/sklearn_big.png" alt="sklearn"> <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
  • Computer Vision

    • Others

      • imgaug_extension - Additional augmentations for imgaug.
      • albumentations - Fast image augmentation library and easy-to-use wrapper around other libraries.
  • Time Series

    • Others

      • dateutil - Powerful extensions to the standard datetime module
      • sktime - A unified framework for machine learning with time series. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • tslearn - Machine learning toolkit dedicated to time-series data. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Reinforcement Learning

    • Others

      • RLlib - Scalable Reinforcement Learning.
      • TensorForce - A TensorFlow library for applied reinforcement learning. <img height="20" src="img/tf_big2.png" alt="TensorFlow">
      • TRFL - TensorFlow Reinforcement Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • Horizon - A platform for Applied Reinforcement Learning.
  • Probabilistic Graphical Models

    • Others

      • pyAgrum - A GRaphical Universal Modeler.
  • Probabilistic Methods

    • Others

      • ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • GPflow - Gaussian processes in TensorFlow. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
      • ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
      • skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by [The Alan Turing Institute](https://www.turing.ac.uk/). <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Model Explanation

    • Others

      • Skater - Python Library for Model Interpretation.
      • FlashLight - Visualization Tool for your NeuralNetwork.
      • shap - A unified approach to explain the output of any machine learning model. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • AI Explainability 360 - Interpretability and explainability of data and machine learning models.
      • tensorboard-pytorch - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
  • Optimization

    • Others

      • OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
      • sigopt_sklearn - SigOpt wrappers for scikit-learn methods. <img height="20" src="img/sklearn_big.png" alt="sklearn">
      • Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
      • POT - Python Optimal Transport library.
  • Visualization

    • Interactive plots

      • plotly - A Python library that makes interactive and publication-quality graphs.
      • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
      • plotly - A Python library that makes interactive and publication-quality graphs.
    • Map

      • folium - Makes it easy to visualize data on an interactive open street map
      • geemap - Python package for interactive mapping with Google Earth Engine (GEE)
    • Automatic Plotting

      • HoloViews - Stop plotting your data - annotate your data and let it visualize itself.
  • Deployment

    • NLP

      • fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
      • streamlit - Make it easy to deploy the machine learning model
      • datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
      • binder - Enable sharing and execute Jupyter Notebooks
      • streamlit - Make it easy to deploy the machine learning model
      • datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
  • Data Manipulation

    • Data Frames

      • pandas - Powerful Python data analysis toolkit.
      • Arctic - High-performance datastore for time series and tick data.
      • pandas-gbq - pandas Google Big Query. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
      • pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
    • Pipelines

      • SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
      • pdpipe - Sasy pipelines for pandas DataFrames.
      • Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
      • pyjanitor - Clean APIs for data cleaning. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
  • Distributed Computing

    • Synthetic Data

      • PySpark - Exposes the Spark programming model to Python. <img height="20" src="img/spark_big.png" alt="Apache Spark based">
      • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. <img height="20" src="img/tf_big2.png" alt="sklearn">
  • Experimentation

    • Synthetic Data

      • Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
      • Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
  • Computations

    • Synthetic Data

      • numpy - The fundamental package needed for scientific computing with Python.
      • numpy - The fundamental package needed for scientific computing with Python.
      • bottleneck - Fast NumPy array functions written in C.
      • numpy - The fundamental package needed for scientific computing with Python.
  • Web Scraping

  • Automated Machine Learning

    • Others

      • AutoGluon - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
      • TPOT - AutoML tool that optimizes machine learning pipelines using genetic programming. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Graph Machine Learning

    • Others

      • pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch. <img height="20" src="img/pytorch_big2.png" alt="PyTorch based/compatible">
  • Feature Engineering

    • General

      • Featuretools - Automated feature engineering.
      • skl-groups - A scikit-learn addon to operate on set/"group"-based features. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Statistics

    • NLP

      • pandas_summary - Extension to pandas dataframes describe function. <img height="20" src="img/pandas_big.png" alt="pandas compatible">
  • Evaluation

    • Synthetic Data

      • AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.
  • Quantum Computing

    • Synthetic Data

      • PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.