Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-approximate-dnn
Curated content for DNN approximation, acceleration ... with a focus on hardware accelerator and deployment
https://github.com/e-dupuis/awesome-approximate-dnn
Last synced: 1 day ago
JSON representation
-
Best Surveys
- Efficient Processing of Deep Neural Networks: A Tutorial and Survey
- Recent Advances in Convolutional Neural Network Acceleration
- Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey
- Pruning and Quantization for Deep Neural Network Acceleration: A Survey
- Deep Neural Network Approximation for Custom Hardware:Where We’ve Been, Where We’re Going
- Approximation Computing Techniques to Accelerate CNN Based Image Processing Applications – A Survey in Hardware/Software Perspective
-
Tools
-
Approximations Frameworks
- Distiller - source Python package for neural network compression research (fine-tuning capable)|Pytorch|Pruning, Quantization (QAT), Knowledge Distillation, Conditional Computation, Regularization|
- Adapt - aware retraining|Pytorch|Approximate Multipliers|
- NEMO - nn | PyTorch, ONNX | PTQ, QAT|
- Microsoft NNI
- PocketFlow - source framework for compressing and accelerating DNNs. | Tensorflow | PTQ, QAT, Prunning |
- QKeras - in replacement for some of the Keras layers| Tensorflow(Keras) | Quantization (QAT) |
- Brevitas
- TFApprox - fit/evoapproxlib) | Tensorflow | Approximate Multipliers|
- Intel Neural Compressor - source Python lib for neural network compression |TensorFlow, PyTorch, ONNX Runtime, MXNet |Pruning (Magnitude, Grad), Quantization (PQT, dynamic, QAT, Mix precision), Knowledge Distillation|
- Qualcomm AIMET - source lib for trained neural network quantization and compression + Model Zoo |TensorFlow, PyTorch |Pruning (Channel), Spatial SVD, per-layer compression ratio selection, Quantization (PQT, QAT, Simulation, Rounding, Bias correction, Cross layer equalization, Mix precision)|
- OpenMMRazor - source toolkit for model slimming and AutoML | OpenMM | Neural Architecture Search (NAS), Pruning, Knowledge Distillation (KD), Quantization (in the next release)|
-
Graph Compiler
- TensorflowLite - TensorFlow Lite is a set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size (linux, android, mcu). [curated content for tflite](https://github.com/margaretmz/awesome-tensorflow-lite)
- OpenVino - OpenCL based graph compiler for intel environnment (Intel CPU, Intel GPU, Dedicated accelerator)
- Vitis AI - Optimal Artificial Intelligence Inference from Edge to Cloud (compiler / optimizer / quantizer / profiler / IP set)
- OnnxRuntime Graph optim - Optimize onnx graph (simplification)
- DORY - automatic tool to deploy DNNs on low-cost MCUs with typically less than 1MB of on-chip SRAM memory
- Glow - Glow is a machine learning compiler and execution engine for hardware accelerators (Pytorch, ONNX)
- Mirage - (GPU) Mirage is a tensor algebra superoptimizer that automatically discovers highly-optimized tensor programs for DNNs. Mirage automatically identifies and verifies sophisticated optimizations, many of which require joint optimization at the kernel, thread block, and thread levels of the GPU compute hierarchy.
- OpenXLA - XLA (Accelerated Linear Algebra) is an open-source machine learning (ML) compiler for GPUs, CPUs, and ML accelerators. The XLA compiler takes models from popular ML frameworks such as PyTorch, TensorFlow, and JAX, and optimizes them for high-performance execution across different hardware platforms including GPUs, CPUs, and ML accelerators.
-
Commercial Dedicated HW accelerator (ASIC)
- Esperanto ET-soc-1
- Google TPU - 275 TFLOPS @ 200W / V3 - 90 TOPS @250W / Coral Edge 4TOPS @ 2W |
- Intel Movidius Myriad - 2.67 TOPS/W|
- Synaptic NPU VIP9000
- Sima ML accelerator MLSoC
- Greenwave GAP8 - GOPS fully programmable RISC-V IoT-edge computing engine, featuring a 8-core cluster with CNN accelerator, coupled with an ultra-low power MCU with 30 μW state-retentive sleep power (75mW)|Edge| 600 GMAC/s/W|
- IBM NorthPole - |
-
Simulation Frameworks
- Eyeriss Energy Estimator - Energy Estimator for MIT's Eyeriss Hardware Accelerator
- SCALE-Sim - ARM CNN accelerator simulator, that provides cycle-accurate timing, power/energy, memory bandwidth and trace results for a specified accelerator configuration and neural network architecture.
- Torchbench - collection of deep learning benchmarks you can use to benchmark your models, optimized for the PyTorch framework.
- Renode - Functional simulation platform for MCU dev & test (single and multi-node)
-
Dedicated Library
- code - QNN inference library for ultra low power PULP RiscV core
-
FPGA based accelerator / HLS for CNNs
- Maestro - open-source tool for modeling and evaluating the performance and energy-efficiency of different dataflows for DNNs
- HLS4ML - package for creating HLS from various ML framework (good pytorch support), create streamline architecture
- FINN - framework for creating HW accelerator (HLS code) from BREVITAS quantized model, downto BNN, create PE architecture
- N2D2 - framework for creating HLS from N2D2 trained model (support ONNX import), create streamline architecture
-
Evaluation Frameworks
- DNN-Neurosim - Framework for evaluating the performance of inference or training of on-chip DNN
-
-
Approximation Methods
-
Multi-techniques
- Cross-Layer Approximation for Printed Machine Learning Circuits - Printed-ML-Classifiers)), - Algorithmic and logic level approximation (coefficient replacement + netlist pruning) through a full DSE for printed ML applications.
- Deep Neural Network Compression by In-Parallel Pruning-Quantization - Use Bayesian optimization to solve both pruning and quantization problems jointly and with fine-tuning.
- OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization - Analytical single shot compression (Pruning + Quantization) of DNN using only pretrained weights values, then fine-tuning to recover ACL
-
Pruning
- Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity - Large matrix multiplication are tiled, this method propose to maintain a regular pattern at the tile level, improving efficiency.
- Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks - Using DeepLift (explainable AI) as hints to improve compression by determining importance of neurons and features
- Post-training deep neural network pruning via layer-wise calibration - Layer-wise sparse pruning calibration based on the use of fractal images to replace representative data, post quantization, achieving 2x compression.
-
Quantization
- Learning Compression from Limited Unlabeled Data - Use unlabelled data to improve accuracy of quantization in a very fast fine-tuning step
- Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors - AutoQKeras, Per layer quantization optimization using meta-heuristic DSE based on Bayesian Optimization, make use of Qkeras & hls4ml.
-
Approximate operators
- Full Approximation of Deep Neural Networks through Efficient Optimization - Select efficient approx multipliers through retraining and minimization of accuracy loss (Evo Approx)
- ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining - Use NSGA II to optimize approximate multipliers implemented & DNN mapping onto implemented Ax multipliers (Evo Approx).
-
-
Others
-
Model ZOO
- Keras Applications - pre-trained popular CNNs implemented in Keras - can be customized and fine tuned
- Torchvision - The torch equivalent to keras applications
- Openvino pre-trained models - Intel pre-trained model for use in OpenVino
- ONNX Model Zoo - Collection of pre-trained onnx models
-
Generic DSE Framework
- Google OR-Tools - Constraint programming, routing and other optimization tools
-
Visualization Framework
- Tensorboard - Visualization tool for Tensorflow, Pytorch ..., can show graph, metric evolution over training ... very adaptable
- Netron - Tool to show ONNX graph with all the attributes.
-
HLS Framework
- ntel Quartus HLS - C++ HLS for ALTERA/INTEL FPGA
- Mentor Catapult HLS - C++/SystemC HLS For Siemens FPGA
- Xilinx Vivado HLS - C/C++ based HLS for XILINX Fpga
- Xilinx Vivado HLS - C/C++ based HLS for XILINX Fpga
-
Efficient DNN Architecture
- Blog post - related to recent mobile architectures
-
DNN conversion framework
-
Programming Languages
Categories
Sub Categories
Approximations Frameworks
11
Graph Compiler
8
Commercial Dedicated HW accelerator (ASIC)
7
FPGA based accelerator / HLS for CNNs
4
Model ZOO
4
Simulation Frameworks
4
HLS Framework
4
Multi-techniques
3
Pruning
3
Quantization
2
DNN conversion framework
2
Approximate operators
2
Visualization Framework
2
Efficient DNN Architecture
1
Generic DSE Framework
1
Dedicated Library
1
Evaluation Frameworks
1
Keywords
deep-learning
10
machine-learning
7
pytorch
7
neural-network
6
quantization
6
onnx
5
keras
5
tensorflow
5
fpga
4
deep-neural-networks
4
pruning
3
hardware-acceleration
2
python
2
dataflow
2
nas
2
caffe
2
model-compression
2
coreml
2
darknet
2
mxnet
2
ml
2
knowledge-distillation
2
automl
2
quantization-aware-training
1
neural-networks
1
ptq
1
post-training-quantization
1
mxformat
1
low-precision
1
qat
1
large-language-models
1
xilinx
1
auto-tuning
1
awq
1
fp4
1
int8
1
gptq
1
int4
1
automated-machine-learning
1
bayesian-optimization
1
data-science
1
deep-neural-network
1
distributed
1
feature-engineering
1
hyperparameter-optimization
1
hyperparameter-tuning
1
machine-learning-algorithms
1
mlops
1
neural-architecture-search
1
computer-vision
1