An open API service indexing awesome lists of open source software.

awesome-python-data-science

Probably the best curated list of data science software in Python.
https://github.com/krzjoa/awesome-python-data-science

Last synced: 1 day ago
JSON representation

  • Natural Language Processing

    • Others

      • gensim - Topic Modelling for Humans.
      • spaCy - Industrial-Strength Natural Language Processing.
  • Machine Learning

    • General Purpose Machine Learning

      • scikit-learn - Machine learning in Python. <img height="20" src="img/sklearn_big.png" alt="sklearn">
    • Ensemble Methods

      • ML-Ensemble - High performance ensemble learning. <img height="20" src="img/sklearn_big.png" alt="sklearn">
  • Deep Learning

    • TensorFlow

      • Keras - A high-level neural networks API running on top of TensorFlow. <img height="20" src="img/keras_big.png" alt="Keras compatible">
  • Time Series

    • Others

      • dateutil - Powerful extensions to the standard datetime module
  • Probabilistic Graphical Models

    • Others

      • pyAgrum - A GRaphical Universal Modeler.
  • Probabilistic Methods

    • Others

      • ZhuSuan - Bayesian Deep Learning. <img height="20" src="img/tf_big2.png" alt="sklearn">
  • Optimization

    • Others

      • OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
  • Visualization

    • Interactive plots

      • plotly - A Python library that makes interactive and publication-quality graphs.
      • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
    • Map

      • folium - Makes it easy to visualize data on an interactive open street map
  • Deployment

    • NLP

      • fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
      • streamlit - Make it easy to deploy the machine learning model
      • datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
      • binder - Enable sharing and execute Jupyter Notebooks
  • Data Manipulation

    • Data Frames

      • pandas - Powerful Python data analysis toolkit.
    • Pipelines

      • SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
  • Experimentation

    • Synthetic Data

      • Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
  • Computations

    • Synthetic Data

      • numpy - The fundamental package needed for scientific computing with Python.
  • Web Scraping