awesome-list
A list of useful stuff in Machine Learning, Computer Graphics, Software Development, ...
https://github.com/johnhany/awesome-list
Last synced: 5 days ago
JSON representation
-
Data Processing
-
Data Pre-processing & Loading
- lazynlp - Library to scrape and clean web pages to create massive datasets.
- Google Images Download - Python Script to download hundreds of images from 'Google Images'.
- Label Studio - A multi-type data labeling and annotation tool with standardized output format.
-
Data Representation
- pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.
- cuDF - GPU DataFrame Library.
- Polars - Fast multi-threaded DataFrame library in Rust, Python and Node.js.
- Modin - Scale your Pandas workflows by changing a single line of code.
- Vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second.
- PyTables - A Python package to manage extremely large amounts of data.
- Pandaral.lel - A simple and efficient tool to parallelize Pandas operations on all available CPUs.
- swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.
- datatable - A Python package for manipulating 2-dimensional tabular data structures.
- xarray - N-D labeled arrays and datasets in Python.
- Zarr - An implementation of chunked, compressed, N-dimensional arrays for Python.
- Python Sorted Containers - Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set.
- Pyrsistent - Persistent/Immutable/Functional data structures for Python.
- immutables - A high-performance immutable mapping type for Python.
- Texthero - A python toolkit to work with text-based dataset, bases on Pandas.
- ftfy - Fixes mojibake and other glitches in Unicode text.
- Box - Python dictionaries with advanced dot notation access.
- bidict - The bidirectional mapping library for Python.
- anytree - Python tree data library.
- pydantic - Data parsing and validation using Python type hints.
- stockstats - Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.
- DocArray - A library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc.
-
Data Similarity
- jellyfish - A library for approximate & phonetic matching of strings.
- TextDistance - Python library for comparing distance between two or more sequences by many algorithms.
- Qdrant - A vector similarity search engine for text, image and categorical data in Rust.
- image-match - a simple package for finding approximate image matches from a corpus.
-
-
Data Visualization
-
Data Management
- Matplotlib - A comprehensive library for creating static, animated, and interactive visualizations in Python.
- Seaborn - A high-level interface for drawing statistical graphics, based on Matplotlib.
- Bokeh - Interactive Data Visualization in the browser, from Python.
- Plotly.js - Open-source JavaScript charting library behind Plotly and Dash.
- Plotly.py - An interactive, open-source, and browser-based graphing library for Python, based on Plotly.js.
- ggplot2 - An implementation of the Grammar of Graphics in R.
- ggpy - ggplot port for python.
- Datapane - An open-source framework to create data science reports in Python.
- Visdom - A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.
- TabPy - Execute Python code on the fly and display results in Tableau visualizations.
- Streamlit - The fastest way to build data apps in Python.
- HyperTools - A Python toolbox for gaining geometric insights into high-dimensional data, based on Matplotlib and Seaborn.
- Dash - Analytical Web Apps for Python, R, Julia and Jupyter, based on Plotly.js.
- mpld3 - An interactive Matplotlib visualization tool in browser, based on D3.
- Vega - A visualization grammar, a declarative format for creating, saving, and sharing interactive visualization designs.
- Vega-Lite - Provides a higher-level grammar for visual analysis that generates complete Vega specifications.
- PyQtGraph - Fast data visualization and GUI tools for scientific / engineering applications.
- VisPy - A high-performance interactive 2D/3D data visualization library, with OpenGL support.
- PyVista - 3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK).
- Potree - WebGL point cloud viewer for large datasets.
- Holoviews - An open-source Python library designed to make data analysis and visualization seamless and simple.
- Graphviz - Python interface for Graphviz to create and render graphs.
- PyGraphistry - A Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer.
- Apache ECharts - A powerful, interactive charting and data visualization library for browser.
- pyecharts - A Python visualization interface for Apache ECharts.
- word_cloud - A little word cloud generator in Python.
- Datashader - A data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.
- plotnine - An implementation of the Grammar of Graphics in Python, based on ggplot2.
- bqplot - An implementation of the Grammar of Graphics for IPython/Jupyter notebooks.
- D-Tale - A visualization tool for Pandas DataFrame, with ipython notebooks support.
- missingno - A Python visualization tool for missing data.
- HiPlot - A lightweight interactive visualization tool to help AI researchers discover correlations and patterns in high-dimensional data.
- Sweetviz - Visualize and compare datasets, target values and associations, with one line of code.
- Netron - Visualizer for neural network, deep learning, and machine learning models.
- livelossplot - Live training loss plot in Jupyter Notebook for Keras, PyTorch and others.
- Diagrams - Lets you draw the cloud system architecture in Python code.
- SandDance - Visually explore, understand, and present your data.
- ML Visuals - Contains figures and templates which you can reuse and customize to improve your scientific writing.
- Scattertext - A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot.
- TensorSpace.js - Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js.
- Netscope - Neural network visualizer.
- draw_convnet - Python script for illustrating Convolutional Neural Network (ConvNet).
- PlotNeuralNet - Latex code for making neural networks diagrams.
- Vega-Altair - A declarative statistical visualization library for Python, based on Vega-Lite.
-
-
Debugging & Profiling & Tracing
-
For C++/C
- x64dbg - An open-source x64/x32 debugger for windows.
- ORBIT - A standalone C/C++ profiler for Windows and Linux.
- BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more.
- osquery - SQL powered operating system instrumentation, monitoring, and analytics.
- Tracy - A real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications.
- Coz - Finding Code that Counts with Causal Profiling.
- timemory - Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python.
- gputop - A GPU profiling tool.
-
For Go
-
For Python
- PySnooper - Never use print for debugging again.
- py-spy - A sampling profiler for Python programs.
- Scalene - A high-performance, high-precision CPU, GPU, and memory profiler for Python.
- pyinstrument - Call stack profiler for Python.
- vprof - A Python package providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.
- GPUtil - A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python.
- Wily - A Python application for tracking, reporting on timing and complexity in Python code.
- Radon - Various code metrics for Python code.
- ps_mem - A utility to accurately report the in core memory usage for a program.
- Pyroscope - Pyroscope is an open source continuous profiling platform.
-
-
Deep Learning Framework
-
Anomaly Detection & Others
- Anomalib - An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
- Gradio - An open-source Python library that is used to build machine learning and data science demos and web applications.
- Traingenerator - Generates custom template code for PyTorch & sklearn, using a simple web UI built with streamlit.
- Fairlearn - A Python package to assess and improve fairness of machine learning models.
- AI Fairness 360 - A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
-
Auto ML & Hyperparameter Optimization
- NNI - An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
- AutoKeras - AutoML library for deep learning.
- KerasTuner - An easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search.
- Talos - Hyperparameter Optimization for TensorFlow, Keras and PyTorch.
- Distiller - Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research.
- Hyperas - A very simple wrapper for convenient hyperparameter optimization for Keras.
- Model Search - A framework that implements AutoML algorithms for model architecture search at scale.
-
Deployment & Distribution
- Hummingbird - A library for compiling trained traditional ML models into tensor computations.
- OpenVINO - An open-source toolkit for optimizing and deploying AI inference.
- open_model_zoo - Pre-trained Deep Learning models and demos (high quality and extremely fast).
- Kubeflow - Machine Learning Toolkit for Kubernetes.
- m2cgen - Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies.
- FairScale - A PyTorch extension library for high performance and large scale training.
- ColossalAI - Provides a collection of parallel components and user-friendly tools to kickstart distributed training and inference in a few lines.
- Ray - A unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
- BentoML - BentoML is compatible across machine learning frameworks and standardizes ML model packaging and management for your team.
- cortex - Production infrastructure for machine learning at scale.
- Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
- Angel - A Flexible and Powerful Parameter Server for large-scale machine learning.
- Elephas - Distributed Deep learning with Keras & Spark.
- Elephas - Distributed Deep learning with Keras & Spark.
- MLeap - Allows data scientists and engineers to deploy machine learning pipelines from Spark and Scikit-learn to a portable format and execution engine.
- ZenML - Build portable, production-ready MLOps pipelines.
- Optimus - An opinionated python library to easily load, process, plot and create ML models that run over pandas, Dask, cuDF, dask-cuDF, Vaex or Spark.
- ONNX - Open standard for machine learning interoperability.
- TensorRT - A C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
- Compute Library - A set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
- Apache TVM - Open deep learning compiler stack for cpu, gpu and specialized accelerators.
- Triton Inference Server - The Triton Inference Server provides an optimized cloud and edge inferencing solution.
- Core ML Tools - Contains supporting tools for Core ML model conversion, editing, and validation.
- Petastorm - Enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format.
- Hivemind - Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
- Mesh Transformer JAX - Model parallel transformers in JAX and Haiku.
- ncnn - A high-performance neural network inference framework optimized for the mobile platform.
- Turi Create - A machine learning library for deployment on MacOS/iOS.
- Apache SINGA - A distributed deep learning platform.
- BytePS - A high performance and generic framework for distributed DNN training.
- MMdnn - MMdnn is a set of tools to help users inter-operate among different deep learning frameworks.
- Nebullvm - An open-source tool designed to speed up AI inference in just a few lines of code.
- DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference.
- BigDL - Building Large-Scale AI Applications for Distributed Big Data.
- Analytics Zoo - Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray.
- MediaPipe - Cross-platform, customizable ML solutions for live and streaming media.
-
High-Level DL APIs
- tf_numpy - A subset of the NumPy API implemented in TensorFlow
- PyTorch - An open source deep learning framework by Facebook, with GPU and dynamic graph support.
- TorchVision - Datasets, Transforms and Models specific to Computer Vision for PyTorch
- TorchText - Data loaders and abstractions for text and NLP for PyTorch
- TorchAudio - Data manipulation and transformation for audio signal processing for PyTorch
- TorchRec - A PyTorch domain library built to provide common sparsity & parallelism primitives needed for large-scale recommender systems (RecSys).
- TorchServe - Serve, optimize and scale PyTorch models in production
- TorchHub - Model zoo for PyTorch
- Ignite - High-level library to help with training and evaluating neural networks for PyTorch
- Captum - A model interpretability and understanding library for PyTorch
- Glow - Compiler for Neural Network hardware accelerators
- TorchArrow - Common and composable data structures built on PyTorch Tensor for efficient batch data representation and processing in PyTorch model authoring
- PyTorchVideo - A deep learning library for video understanding research, based on PyTorch
- tensorboardX - Tensorboard for pytorch (and chainer, mxnet, numpy, ...)
- Apex - Tools for easy mixed precision and distributed training in Pytorch
- HuggingFace Accelerate - A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
- PyTorch Metric Learning - The easiest way to use deep metric learning in your application. Modular, flexible, and extensible, written in PyTorch
- Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch
- torch-optimizer - Collection of optimizers for PyTorch compatible with optim module
- PyTorch Sparse - PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations
- PyTorch Scatter - PyTorch Extension Library of Optimized Scatter Operations
- Torch-Struct - A library of tested, GPU implementations of core structured prediction algorithms for deep learning applications
- torchinfo - View model summaries in PyTorch
- Torchshow - Visualize PyTorch tensors with a single line of code
- torch2trt - An easy to use PyTorch to TensorRT converter
- Kaolin - A PyTorch Library for Accelerating 3D Deep Learning Research
- higher - A pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps
- TensorFlow - An open source deep learning framework by Google, with GPU support.
- TensorBoard - TensorFlow's Visualization Toolkit
- TensorFlow Text - A collection of text related classes and ops for TensorFlow
- TensorFlow Recommenders - A library for building recommender system models using TensorFlow.
- TensorFlow Ranking - A library for Learning-to-Rank (LTR) techniques on the TensorFlow platform.
- TensorFlow Serving - A flexible, high-performance serving system for machine learning models based on TensorFlow
- TFX - An end-to-end platform for deploying production ML pipelines.
- TFDS - A collection of datasets ready to use with TensorFlow and Jax
- TensorFlow Addons - Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
- TensorFlow Transform - A library for preprocessing data with TensorFlow
- TensorFlow Model Garden - Models and examples built with TensorFlow
- TensorFlow Hub - A library for transfer learning by reusing parts of TensorFlow models
- TensorFlow.js - A WebGL accelerated JavaScript library for training and deploying ML models based on TensorFlow
- TensorFlow Probability - Probabilistic reasoning and statistical analysis in TensorFlow
- TensorFlow Model Optimization Toolkit - A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning
- TensorFlow Model Analysis - A library for evaluating TensorFlow models
- Trax - Deep Learning with Clear Code and Speed
- Lattice - Lattice methods in TensorFlow
- TensorFlowOnSpark - Brings TensorFlow programs to Apache Spark clusters
- Tensor2Tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
- PaddlePaddle - An open source deep learning framework by Baidu, with GPU support.
- PaddleOCR - Multilingual OCR toolkits based on PaddlePaddle
- PaddleDetection - Object detection toolkit based on PaddlePaddle
- PaddleSeg - Image segmentation toolkit based on PaddlePaddle
- PaddleClas - Visual classification and recognition toolkit based on PaddlePaddle
- PaddleGAN - Generative Adversarial Networks toolkit based on PaddlePaddle
- PaddleVideo - Video understanding toolkit based on PaddlePaddle
- PaddleRec - Recommendation algorithm based on PaddlePaddle
- PaddleNLP - Natural language processing toolkit based on PaddlePaddle
- PaddleSpeech - Speech Recognition/Translation toolkit based on PaddlePaddle
- PGL - An efficient and flexible graph learning framework based on PaddlePaddle
-
Programming Languages
Categories
Deep Learning Framework
150
Programming Language Tutorials
83
Containers & Language Extentions & Linting
82
Computer Vision
79
Data Processing
78
Machine Learning Framework
72
Data Management & Processing
62
Natural Language Processing
60
Cross-Platform
56
Linear Algebra / Statistics Toolkit
53
Data Format & I/O
51
Machine Learning
46
Data Visualization
44
Web Development
43
Desktop App Development
42
DevOps
32
Game Engines
28
Machine Learning Tutorials
25
Reinforcement Learning
24
Graphic Libraries & Renderers
22
Debugging & Profiling & Tracing
21
Programming Language
21
Mobile Development
20
Time-Series & Financial
19
Graph
15
Recommendation, Advertisement & Ranking
14
Windows
13
Process, Thread & Coroutine
12
Package Management
12
Other Machine Learning Applications
11
Causal Inference
10
Linux
9
Security
7
CG Tutorials
6
Computer Graphics
5
MacOS
3
For JavaScript
1
Sub Categories
Data Management
178
JavaScript
175
Others
101
For Python
97
High-Level DL APIs
95
C++/C Toolkit
77
Database & Cloud Management
52
General Purpose Framework
42
Data Pre-processing & Loading
41
For Scala
39
Deployment & Distribution
36
For C++/C
34
General Purpose NLP
32
General Purpose Tensor Library
30
Python Toolkit
30
Classification & Detection & Tracking
28
General Purpose CV
24
Data Representation
22
Conversation & Translation
17
For Go
15
OCR
14
Statistical Toolkit
14
Image / Video Generation
13
Streaming Data Management
12
For Java
12
C++/C
11
Experiment Management
10
Python
10
Hyperparameter Search & Gradient-Free Optimization
8
For JavaScript
8
Speech & Audio
7
Interpretability & Adversarial Training
7
Auto ML & Hyperparameter Optimization
7
Tensor Similarity & Dimension Reduction
5
Anomaly Detection & Others
5
Model Interpretation
5
Nearest Neighbors & Similarity
5
Data Similarity
4
Java
2
Anomaly Detection
2
Flutter
2
Go
2
Scala
1
Keywords
python
354
machine-learning
234
deep-learning
187
pytorch
109
data-science
75
tensorflow
72
cpp
46
nlp
42
neural-network
38
computer-vision
37
natural-language-processing
36
visualization
36
artificial-intelligence
34
gpu
34
go
33
javascript
33
c-plus-plus
32
ai
32
java
30
scikit-learn
29
golang
28
python3
26
cross-platform
26
keras
26
ml
25
android
25
numpy
24
linux
24
data-analysis
23
windows
23
data-visualization
23
pandas
21
awesome
20
c
20
neural-networks
20
cuda
20
awesome-list
19
mlops
18
opengl
18
react
18
game-development
17
reinforcement-learning
17
game-engine
16
deep-neural-networks
16
time-series
16
vulkan
16
gamedev
15
image-processing
15
database
15
statistics
15