An open API service indexing awesome lists of open source software.

awesome-python-data-science

A curated list of Python libraries used for data science.
https://github.com/thomasjpfan/awesome-python-data-science

Last synced: 13 days ago
JSON representation

  • Feature Extraction

    • Time Series

      • prophet - Tool for producing high quality forecasts.
      • tsfresh - Automatic extraction of relevant features from time series.
      • tslearn - Machine learning toolkit dedicated to time-series data.
      • pyts - A Python package for time series transformation and classification.
      • luminaire - ML driven solutions for monitoring time series data.
      • NeuralProphet - A Neural Network based Time-Series model, inspired by Facebook Prophet and AR-Net, built on PyTorch.
      • sktime - A scikit-learn compatible Python toolbox for learning with time series data.
  • Machine Learning Frameworks

    • Xgboost - Scalable, Portable and Distributed Gradient Boosting.
    • scikit-learn - Machine learning.
    • statsmodels - Statistical modeling and econometrics.
    • SymPy - A computer algebra system.
    • dask-ml - Distributed and parallel machine learning.
    • imbalanced-learn - Perform under sampling and over sampling.
    • lightning - Large-scale linear models.
    • scikit-optimize - Sequential model-based optimization with a `scipy.optimize` interface.
    • BayesianOptimization - Global optimization with gaussian processes.
    • gplearn - Genetic Programming.
    • python-glmnet - glmnet package for fitting generalized linear models.
    • hmmlearn - Hidden Markov Models.
    • vecstack - stacking (machine learning technique).
    • deap - Evolutionary computation framework.
    • civisml-extensions - scikit-learn-compatible estimators from Civis Analytics.
    • hyperopt-sklearn - Hyper-parameter optimization for sklearn.
    • scikit-survival - Survival analysis built on top of scikit-learn.
    • dstoolbox - Tools that make working with scikit-learn and pandas easier.
    • modin - Unify the way you interact with your data.
    • pyomo - Python Optimization MOdels.
    • BAMBI - BAyesian Model-Building Interface.
    • combo - A Python Toolbox for Machine Learning Model Combination.
    • fastai - The fast.ai deep learning library, lessons, and tutorials.
    • pycaret - Low-code machine learning library in Python.
    • river - River is a Python library for online machine learning.
    • pyro - Deep universal probabilistic programming with PyTorch.
    • PyMC - Probabilistic Programming.
  • Misc

    • Ranking/Recommender

      • mmh3 - MurmurHash3, a set of fast and robust hash functions.
      • pipeline - Standard Runtime For Every Real-Time Machine Learning.
      • crayon - A language-agnostic interface to TensorBoard.
      • faiss - A library for efficient similarity search and clustering of dense vectors.
  • Outlier Detection

    • PyOD - Versatile Python library for detecting anomalies in multivariate data.
    • DeepOD - Deep learning-based outlier/anomaly detection
  • Profiling

    • Ranking/Recommender

      • memory_profiler - monitoring memory usage of a python program.
      • mem_usage_ui - Measuring and graphing memory usage of local processes.
      • viztracer - VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.
      • py-spy - Sampling profiler for Python programs.
      • memory_profiler - monitoring memory usage of a python program.
      • line_profiler - Line-by-line profiling.
      • filprofiler - Fil a memory profiler designed for data processing applications.
      • scalene - High-performance CPU and memory profiler for Python.
      • python-flamegraph - Statistical profiler which outputs in format suitable for FlameGraph.
  • Python Tools

    • Ranking/Recommender

      • devpi - PyPI server and packaging/testing/release tool.
      • sacred - Reproduce computational experiments.
      • Typer - Build CLIs with type hints.
      • neurtu - A Python package for parametric benchmarks.
      • pyprojroot - Finding project directories in Python.
      • datasette - An open source multi-tool for exploring and publishing data.
      • delorean - Time Travel Made Easy.
      • pip-tools - Keeps dependencies up to date.
      • click - CLI package.
      • sacredboard - Dashboard for sacred.
      • magic-wormhole - get things from one computer to another, safely.
  • Scientific

    • Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
    • Numba - NumPy aware dynamic Python compiler using LLVM.
    • blaze - NumPy and Pandas for databases.
    • PyDy - Multibody Dynamics.
    • nilearn - NeuroImaging.
    • patsy - Describing statistical models using symbolic formulas.
    • numexpr - Fast numerical array expression evaluator.
    • dask - Parallel computing with task scheduling.
    • or-tools - Google's Operations Research tools. Classical CS algorithms.
    • cvxpy - Python-embedded modeling language for convex optimization problems.
    • NumPy - A fundamental package for scientific computing with Python.
    • astropy - Astronomy and astrophysics.
  • Trading

    • Ranking/Recommender

      • Clairvoyant - Identify and monitor social/historical cues.
      • zipline - Algorithmic Trading Library.
  • Visualization

    • PyGWalker - Turns pandas and polars dataframes into a Tableau-like user interface for visual exploration.
    • Great Tables - Absolutely Delightful Table-making in Python.
    • diagrams - Diagrams lets you draw the cloud system architecture in Python code.
    • bokeh - Interactive web plotting.
    • dash - Interactive Web plotting.
    • altair - Declarative statistical visualization.
    • folium - Leaflet.js Maps.
    • geoplot - High-level geospatial data visualization.
    • mplleaftlet - Matplotlib plots from Python into interactive Leaflet web maps.
    • matplotlib-venn - Area-weighted venn-diagrams.
    • pyLDAvis - Interactive topic model visualization.
    • cufflinks - Productivity Tools for Plotly + Pandas.
    • scatterText - Visualizations of how language differs among document types.
    • plotnine - ggplot for python.
    • mizani - scales package.
    • PtitPrince - Raindrop cloud.
    • dtreeviz - Decision tree visualization and model interpretation.
    • ipyvolume - 3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL.
    • matplotlib - 2D plotting.
    • seaborn - Visualization library.
    • bqplot - Plotting library for IPython/Jupyter Notebooks.