awesome-approximate-dnn

Curated content for DNN approximation, acceleration ... with a focus on hardware accelerator and deployment
https://github.com/e-dupuis/awesome-approximate-dnn

Last synced: 4 days ago
JSON representation

Best Surveys
Tools
- Approximations Frameworks
  - Distiller - source Python package for neural network compression research (fine-tuning capable)|Pytorch|Pruning, Quantization (QAT), Knowledge Distillation, Conditional Computation, Regularization|
  - Adapt - aware retraining|Pytorch|Approximate Multipliers|
  - NEMO - nn | PyTorch, ONNX | PTQ, QAT|
  - Microsoft NNI
  - PocketFlow - source framework for compressing and accelerating DNNs. | Tensorflow | PTQ, QAT, Prunning |
  - QKeras - in replacement for some of the Keras layers| Tensorflow(Keras) | Quantization (QAT) |
  - Brevitas
  - TFApprox - fit/evoapproxlib) | Tensorflow | Approximate Multipliers|
  - Intel Neural Compressor - source Python lib for neural network compression |TensorFlow, PyTorch, ONNX Runtime, MXNet |Pruning (Magnitude, Grad), Quantization (PQT, dynamic, QAT, Mix precision), Knowledge Distillation|
  - Qualcomm AIMET - source lib for trained neural network quantization and compression + Model Zoo |TensorFlow, PyTorch |Pruning (Channel), Spatial SVD, per-layer compression ratio selection, Quantization (PQT, QAT, Simulation, Rounding, Bias correction, Cross layer equalization, Mix precision)|
  - OpenMMRazor - source toolkit for model slimming and AutoML | OpenMM | Neural Architecture Search (NAS), Pruning, Knowledge Distillation (KD), Quantization (in the next release)|
  - Tensorflow Model Optimization
- Graph Compiler
  - OpenVino - OpenCL based graph compiler for intel environnment (Intel CPU, Intel GPU, Dedicated accelerator)
  - Vitis AI - Optimal Artificial Intelligence Inference from Edge to Cloud (compiler / optimizer / quantizer / profiler / IP set)
  - OnnxRuntime Graph optim - Optimize onnx graph (simplification)
  - DORY - automatic tool to deploy DNNs on low-cost MCUs with typically less than 1MB of on-chip SRAM memory
  - Glow - Glow is a machine learning compiler and execution engine for hardware accelerators (Pytorch, ONNX)
  - Mirage - (GPU) Mirage is a tensor algebra superoptimizer that automatically discovers highly-optimized tensor programs for DNNs. Mirage automatically identifies and verifies sophisticated optimizations, many of which require joint optimization at the kernel, thread block, and thread levels of the GPU compute hierarchy.
  - OpenXLA - XLA (Accelerated Linear Algebra) is an open-source machine learning (ML) compiler for GPUs, CPUs, and ML accelerators. The XLA compiler takes models from popular ML frameworks such as PyTorch, TensorFlow, and JAX, and optimizes them for high-performance execution across different hardware platforms including GPUs, CPUs, and ML accelerators.
- Commercial Dedicated HW accelerator (ASIC)
  - Esperanto ET-soc-1
  - Google TPU - 275 TFLOPS @ 200W / V3 - 90 TOPS @250W / Coral Edge 4TOPS @ 2W |
  - Synaptic NPU VIP9000
  - Sima ML accelerator MLSoC
  - Esperanto ET-soc-1
  - Intel Movidius Myriad - 2.67 TOPS/W|
  - Greenwave GAP8 - GOPS fully programmable RISC-V IoT-edge computing engine, featuring a 8-core cluster with CNN accelerator, coupled with an ultra-low power MCU with 30 μW state-retentive sleep power (75mW)|Edge| 600 GMAC/s/W|
  - IBM NorthPole - |
  - Google TPU - 275 TFLOPS @ 200W / V3 - 90 TOPS @250W / Coral Edge 4TOPS @ 2W |
  - Google TPU - 275 TFLOPS @ 200W / V3 - 90 TOPS @250W / Coral Edge 4TOPS @ 2W |
- Simulation Frameworks
  - Eyeriss Energy Estimator - Energy Estimator for MIT's Eyeriss Hardware Accelerator
  - SCALE-Sim - ARM CNN accelerator simulator, that provides cycle-accurate timing, power/energy, memory bandwidth and trace results for a specified accelerator configuration and neural network architecture.
  - Torchbench - collection of deep learning benchmarks you can use to benchmark your models, optimized for the PyTorch framework.
  - Renode - Functional simulation platform for MCU dev & test (single and multi-node)
  - Eyeriss Energy Estimator - Energy Estimator for MIT's Eyeriss Hardware Accelerator
- Dedicated Library
  - code - QNN inference library for ultra low power PULP RiscV core
- FPGA based accelerator / HLS for CNNs
  - Maestro - open-source tool for modeling and evaluating the performance and energy-efficiency of different dataflows for DNNs
  - HLS4ML - package for creating HLS from various ML framework (good pytorch support), create streamline architecture
  - FINN - framework for creating HW accelerator (HLS code) from BREVITAS quantized model, downto BNN, create PE architecture
  - N2D2 - framework for creating HLS from N2D2 trained model (support ONNX import), create streamline architecture
  - ScaleHLS - HLS framework on MLIR. Can compile HLS C/C++ or ONNX model to optimized HLS C/C++ in order to generate high-efficiency RTL design using downstream tools, such as Vivado HLS. Focus on scalability, automated DSE engine.
- Evaluation Frameworks
  - DNN-Neurosim - Framework for evaluating the performance of inference or training of on-chip DNN
Approximation Methods
- Multi-techniques
  - Cross-Layer Approximation for Printed Machine Learning Circuits - Printed-ML-Classifiers)), - Algorithmic and logic level approximation (coefficient replacement + netlist pruning) through a full DSE for printed ML applications.
  - Deep Neural Network Compression by In-Parallel Pruning-Quantization - Use Bayesian optimization to solve both pruning and quantization problems jointly and with fine-tuning.
  - OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization - Analytical single shot compression (Pruning + Quantization) of DNN using only pretrained weights values, then fine-tuning to recover ACL
- Pruning
  - Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity - Large matrix multiplication are tiled, this method propose to maintain a regular pattern at the tile level, improving efficiency.
  - Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks - Using DeepLift (explainable AI) as hints to improve compression by determining importance of neurons and features
  - Post-training deep neural network pruning via layer-wise calibration - Layer-wise sparse pruning calibration based on the use of fractal images to replace representative data, post quantization, achieving 2x compression.
  - Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity - Large matrix multiplication are tiled, this method propose to maintain a regular pattern at the tile level, improving efficiency.
- Quantization
  - Learning Compression from Limited Unlabeled Data - Use unlabelled data to improve accuracy of quantization in a very fast fine-tuning step
  - Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors - AutoQKeras, Per layer quantization optimization using meta-heuristic DSE based on Bayesian Optimization, make use of Qkeras & hls4ml.
- Approximate operators
  - Full Approximation of Deep Neural Networks through Efficient Optimization - Select efficient approx multipliers through retraining and minimization of accuracy loss (Evo Approx)
  - ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining - Use NSGA II to optimize approximate multipliers implemented & DNN mapping onto implemented Ax multipliers (Evo Approx).
  - Full Approximation of Deep Neural Networks through Efficient Optimization - Select efficient approx multipliers through retraining and minimization of accuracy loss (Evo Approx)
Others
- Model ZOO
  - Keras Applications - pre-trained popular CNNs implemented in Keras - can be customized and fine tuned
  - Torchvision - The torch equivalent to keras applications
  - ONNX Model Zoo - Collection of pre-trained onnx models
  - Tensorflow Hub - pre-trained model that can be imported as keras layers for deployment / fine-tuning
  - Openvino pre-trained models - Intel pre-trained model for use in OpenVino
  - TIMM - Excellent model zoo & training scripts for pytorch
- Visualization Framework
  - Tensorboard - Visualization tool for Tensorflow, Pytorch ..., can show graph, metric evolution over training ... very adaptable
  - Netron - Tool to show ONNX graph with all the attributes.
- HLS Framework
  - ntel Quartus HLS - C++ HLS for ALTERA/INTEL FPGA
  - Mentor Catapult HLS - C++/SystemC HLS For Siemens FPGA
  - Xilinx Vivado HLS - C/C++ based HLS for XILINX Fpga
  - Xilinx Vivado HLS - C/C++ based HLS for XILINX Fpga
  - ntel Quartus HLS - C++ HLS for ALTERA/INTEL FPGA
- Efficient DNN Architecture
  - Blog post - related to recent mobile architectures
- DNN conversion framework
  - MMdnn - Microsoft tool for cross-framework conversion, retraining, visualization & deployment
  - ONNX - model format to exchange frozen models between ML frameworks
- Contests
  - MLPerf / MLCommons - Acceleration contest for ML
  - Papers with Code - latest papers / code in ML, SoTA representation for several applications (CV, NLP, Medical ...)
- Generic DSE Framework
  - Google OR-Tools - Constraint programming, routing and other optimization tools

Programming Languages

Python 14 C++ 5 C 3 Jupyter Notebook 1 MATLAB 1 RobotFramework 1 JavaScript 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-approximate-dnn

Best Surveys

Tools

Approximations Frameworks

Graph Compiler

Commercial Dedicated HW accelerator (ASIC)

Simulation Frameworks

Dedicated Library

FPGA based accelerator / HLS for CNNs

Evaluation Frameworks

Approximation Methods

Multi-techniques

Pruning

Quantization

Approximate operators

Others

Model ZOO

Visualization Framework

HLS Framework

Efficient DNN Architecture

DNN conversion framework

Contests

Generic DSE Framework