An open API service indexing awesome lists of open source software.

awesome-list

A list of useful stuff in Machine Learning, Computer Graphics, Software Development, ...
https://github.com/johnhany/awesome-list

Last synced: 5 days ago
JSON representation

  • Data Processing

    • Data Pre-processing & Loading

      • lazynlp - Library to scrape and clean web pages to create massive datasets.
      • Google Images Download - Python Script to download hundreds of images from 'Google Images'.
      • Label Studio - A multi-type data labeling and annotation tool with standardized output format.
    • Data Representation

      • pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.
      • cuDF - GPU DataFrame Library.
      • Polars - Fast multi-threaded DataFrame library in Rust, Python and Node.js.
      • Modin - Scale your Pandas workflows by changing a single line of code.
      • Vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second.
      • PyTables - A Python package to manage extremely large amounts of data.
      • Pandaral.lel - A simple and efficient tool to parallelize Pandas operations on all available CPUs.
      • swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.
      • datatable - A Python package for manipulating 2-dimensional tabular data structures.
      • xarray - N-D labeled arrays and datasets in Python.
      • Zarr - An implementation of chunked, compressed, N-dimensional arrays for Python.
      • Python Sorted Containers - Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set.
      • Pyrsistent - Persistent/Immutable/Functional data structures for Python.
      • immutables - A high-performance immutable mapping type for Python.
      • Texthero - A python toolkit to work with text-based dataset, bases on Pandas.
      • ftfy - Fixes mojibake and other glitches in Unicode text.
      • Box - Python dictionaries with advanced dot notation access.
      • bidict - The bidirectional mapping library for Python.
      • anytree - Python tree data library.
      • pydantic - Data parsing and validation using Python type hints.
      • stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
      • DocArray - A library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc.
    • Data Similarity

      • jellyfish - A library for approximate & phonetic matching of strings.
      • TextDistance - Python library for comparing distance between two or more sequences by many algorithms.
      • Qdrant - A vector similarity search engine for text, image and categorical data in Rust.
      • image-match - a simple package for finding approximate image matches from a corpus.
  • Data Visualization

    • Data Management

      • Matplotlib - A comprehensive library for creating static, animated, and interactive visualizations in Python.
      • Seaborn - A high-level interface for drawing statistical graphics, based on Matplotlib.
      • Bokeh - Interactive Data Visualization in the browser, from Python.
      • Plotly.js - Open-source JavaScript charting library behind Plotly and Dash.
      • Plotly.py - An interactive, open-source, and browser-based graphing library for Python, based on Plotly.js.
      • ggplot2 - An implementation of the Grammar of Graphics in R.
      • ggpy - ggplot port for python.
      • Datapane - An open-source framework to create data science reports in Python.
      • Visdom - A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.
      • TabPy - Execute Python code on the fly and display results in Tableau visualizations.
      • Streamlit - The fastest way to build data apps in Python.
      • HyperTools - A Python toolbox for gaining geometric insights into high-dimensional data, based on Matplotlib and Seaborn.
      • Dash - Analytical Web Apps for Python, R, Julia and Jupyter, based on Plotly.js.
      • mpld3 - An interactive Matplotlib visualization tool in browser, based on D3.
      • Vega - A visualization grammar, a declarative format for creating, saving, and sharing interactive visualization designs.
      • Vega-Lite - Provides a higher-level grammar for visual analysis that generates complete Vega specifications.
      • PyQtGraph - Fast data visualization and GUI tools for scientific / engineering applications.
      • VisPy - A high-performance interactive 2D/3D data visualization library, with OpenGL support.
      • PyVista - 3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK).
      • Potree - WebGL point cloud viewer for large datasets.
      • Holoviews - An open-source Python library designed to make data analysis and visualization seamless and simple.
      • Graphviz - Python interface for Graphviz to create and render graphs.
      • PyGraphistry - A Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer.
      • Apache ECharts - A powerful, interactive charting and data visualization library for browser.
      • pyecharts - A Python visualization interface for Apache ECharts.
      • word_cloud - A little word cloud generator in Python.
      • Datashader - A data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.
      • plotnine - An implementation of the Grammar of Graphics in Python, based on ggplot2.
      • bqplot - An implementation of the Grammar of Graphics for IPython/Jupyter notebooks.
      • D-Tale - A visualization tool for Pandas DataFrame, with ipython notebooks support.
      • missingno - A Python visualization tool for missing data.
      • HiPlot - A lightweight interactive visualization tool to help AI researchers discover correlations and patterns in high-dimensional data.
      • Sweetviz - Visualize and compare datasets, target values and associations, with one line of code.
      • Netron - Visualizer for neural network, deep learning, and machine learning models.
      • livelossplot - Live training loss plot in Jupyter Notebook for Keras, PyTorch and others.
      • Diagrams - Lets you draw the cloud system architecture in Python code.
      • SandDance - Visually explore, understand, and present your data.
      • ML Visuals - Contains figures and templates which you can reuse and customize to improve your scientific writing.
      • Scattertext - A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot.
      • TensorSpace.js - Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js.
      • Netscope - Neural network visualizer.
      • draw_convnet - Python script for illustrating Convolutional Neural Network (ConvNet).
      • PlotNeuralNet - Latex code for making neural networks diagrams.
      • Vega-Altair - A declarative statistical visualization library for Python, based on Vega-Lite.
  • Debugging & Profiling & Tracing

    • For C++/C

      • x64dbg - An open-source x64/x32 debugger for windows.
      • ORBIT - A standalone C/C++ profiler for Windows and Linux.
      • BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more.
      • osquery - SQL powered operating system instrumentation, monitoring, and analytics.
      • Tracy - A real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications.
      • Coz - Finding Code that Counts with Causal Profiling.
      • timemory - Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python.
      • gputop - A GPU profiling tool.
    • For Go

      • gops - A tool to list and diagnose Go processes currently running on your system.
      • pprof - A tool for visualization and analysis of profiling data.
      • JD-GUI - A standalone Java Decompiler GUI.
    • For Python

      • PySnooper - Never use print for debugging again.
      • py-spy - A sampling profiler for Python programs.
      • Scalene - A high-performance, high-precision CPU, GPU, and memory profiler for Python.
      • pyinstrument - Call stack profiler for Python.
      • vprof - A Python package providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.
      • GPUtil - A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python.
      • Wily - A Python application for tracking, reporting on timing and complexity in Python code.
      • Radon - Various code metrics for Python code.
      • ps_mem - A utility to accurately report the in core memory usage for a program.
      • Pyroscope - Pyroscope is an open source continuous profiling platform.
  • Deep Learning Framework

    • Anomaly Detection & Others

      • Anomalib - An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
      • Gradio - An open-source Python library that is used to build machine learning and data science demos and web applications.
      • Traingenerator - Generates custom template code for PyTorch & sklearn, using a simple web UI built with streamlit.
      • Fairlearn - A Python package to assess and improve fairness of machine learning models.
      • AI Fairness 360 - A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
    • Auto ML & Hyperparameter Optimization

      • NNI - An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
      • AutoKeras - AutoML library for deep learning.
      • KerasTuner - An easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search.
      • Talos - Hyperparameter Optimization for TensorFlow, Keras and PyTorch.
      • Distiller - Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research.
      • Hyperas - A very simple wrapper for convenient hyperparameter optimization for Keras.
      • Model Search - A framework that implements AutoML algorithms for model architecture search at scale.
    • Deployment & Distribution

      • Hummingbird - A library for compiling trained traditional ML models into tensor computations.
      • OpenVINO - An open-source toolkit for optimizing and deploying AI inference.
      • open_model_zoo - Pre-trained Deep Learning models and demos (high quality and extremely fast).
      • Kubeflow - Machine Learning Toolkit for Kubernetes.
      • m2cgen - Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies.
      • FairScale - A PyTorch extension library for high performance and large scale training.
      • ColossalAI - Provides a collection of parallel components and user-friendly tools to kickstart distributed training and inference in a few lines.
      • Ray - A unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
      • BentoML - BentoML is compatible across machine learning frameworks and standardizes ML model packaging and management for your team.
      • cortex - Production infrastructure for machine learning at scale.
      • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
      • Angel - A Flexible and Powerful Parameter Server for large-scale machine learning.
      • Elephas - Distributed Deep learning with Keras & Spark.
      • Elephas - Distributed Deep learning with Keras & Spark.
      • MLeap - Allows data scientists and engineers to deploy machine learning pipelines from Spark and Scikit-learn to a portable format and execution engine.
      • ZenML - Build portable, production-ready MLOps pipelines.
      • Optimus - An opinionated python library to easily load, process, plot and create ML models that run over pandas, Dask, cuDF, dask-cuDF, Vaex or Spark.
      • ONNX - Open standard for machine learning interoperability.
      • TensorRT - A C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
      • Compute Library - A set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
      • Apache TVM - Open deep learning compiler stack for cpu, gpu and specialized accelerators.
      • Triton Inference Server - The Triton Inference Server provides an optimized cloud and edge inferencing solution.
      • Core ML Tools - Contains supporting tools for Core ML model conversion, editing, and validation.
      • Petastorm - Enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format.
      • Hivemind - Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
      • Mesh Transformer JAX - Model parallel transformers in JAX and Haiku.
      • ncnn - A high-performance neural network inference framework optimized for the mobile platform.
      • Turi Create - A machine learning library for deployment on MacOS/iOS.
      • Apache SINGA - A distributed deep learning platform.
      • BytePS - A high performance and generic framework for distributed DNN training.
      • MMdnn - MMdnn is a set of tools to help users inter-operate among different deep learning frameworks.
      • Nebullvm - An open-source tool designed to speed up AI inference in just a few lines of code.
      • DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference.
      • BigDL - Building Large-Scale AI Applications for Distributed Big Data.
      • Analytics Zoo - Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray.
      • MediaPipe - Cross-platform, customizable ML solutions for live and streaming media.
    • High-Level DL APIs

      • tf_numpy - A subset of the NumPy API implemented in TensorFlow
      • PyTorch - An open source deep learning framework by Facebook, with GPU and dynamic graph support.
      • TorchVision - Datasets, Transforms and Models specific to Computer Vision for PyTorch
      • TorchText - Data loaders and abstractions for text and NLP for PyTorch
      • TorchAudio - Data manipulation and transformation for audio signal processing for PyTorch
      • TorchRec - A PyTorch domain library built to provide common sparsity & parallelism primitives needed for large-scale recommender systems (RecSys).
      • TorchServe - Serve, optimize and scale PyTorch models in production
      • TorchHub - Model zoo for PyTorch
      • Ignite - High-level library to help with training and evaluating neural networks for PyTorch
      • Captum - A model interpretability and understanding library for PyTorch
      • Glow - Compiler for Neural Network hardware accelerators
      • TorchArrow - Common and composable data structures built on PyTorch Tensor for efficient batch data representation and processing in PyTorch model authoring
      • PyTorchVideo - A deep learning library for video understanding research, based on PyTorch
      • tensorboardX - Tensorboard for pytorch (and chainer, mxnet, numpy, ...)
      • Apex - Tools for easy mixed precision and distributed training in Pytorch
      • HuggingFace Accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
      • PyTorch Metric Learning - The easiest way to use deep metric learning in your application. Modular, flexible, and extensible, written in PyTorch
      • Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch
      • torch-optimizer - Collection of optimizers for PyTorch compatible with optim module
      • PyTorch Sparse - PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations
      • PyTorch Scatter - PyTorch Extension Library of Optimized Scatter Operations
      • Torch-Struct - A library of tested, GPU implementations of core structured prediction algorithms for deep learning applications
      • torchinfo - View model summaries in PyTorch
      • Torchshow - Visualize PyTorch tensors with a single line of code
      • torch2trt - An easy to use PyTorch to TensorRT converter
      • Kaolin - A PyTorch Library for Accelerating 3D Deep Learning Research
      • higher - A pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps
      • TensorFlow - An open source deep learning framework by Google, with GPU support.
      • TensorBoard - TensorFlow's Visualization Toolkit
      • TensorFlow Text - A collection of text related classes and ops for TensorFlow
      • TensorFlow Recommenders - A library for building recommender system models using TensorFlow.
      • TensorFlow Ranking - A library for Learning-to-Rank (LTR) techniques on the TensorFlow platform.
      • TensorFlow Serving - A flexible, high-performance serving system for machine learning models based on TensorFlow
      • TFX - An end-to-end platform for deploying production ML pipelines.
      • TFDS - A collection of datasets ready to use with TensorFlow and Jax
      • TensorFlow Addons - Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
      • TensorFlow Transform - A library for preprocessing data with TensorFlow
      • TensorFlow Model Garden - Models and examples built with TensorFlow
      • TensorFlow Hub - A library for transfer learning by reusing parts of TensorFlow models
      • TensorFlow.js - A WebGL accelerated JavaScript library for training and deploying ML models based on TensorFlow
      • TensorFlow Probability - Probabilistic reasoning and statistical analysis in TensorFlow
      • TensorFlow Model Optimization Toolkit - A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning
      • TensorFlow Model Analysis - A library for evaluating TensorFlow models
      • Trax - Deep Learning with Clear Code and Speed
      • Lattice - Lattice methods in TensorFlow
      • TensorFlowOnSpark - Brings TensorFlow programs to Apache Spark clusters
      • Tensor2Tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
      • PaddlePaddle - An open source deep learning framework by Baidu, with GPU support.
      • PaddleOCR - Multilingual OCR toolkits based on PaddlePaddle
      • PaddleDetection - Object detection toolkit based on PaddlePaddle
      • PaddleSeg - Image segmentation toolkit based on PaddlePaddle
      • PaddleClas - Visual classification and recognition toolkit based on PaddlePaddle
      • PaddleGAN - Generative Adversarial Networks toolkit based on PaddlePaddle
      • PaddleVideo - Video understanding toolkit based on PaddlePaddle
      • PaddleRec - Recommendation algorithm based on PaddlePaddle
      • PaddleNLP - Natural language processing toolkit based on PaddlePaddle
      • PaddleSpeech - Speech Recognition/Translation toolkit based on PaddlePaddle
      • PGL - An efficient and flexible graph learning framework based on PaddlePaddle