Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
https://github.com/Efficient-ML/Awesome-Model-Quantization

Last synced: 17 days ago
JSON representation

Awesome_Efficient_LLM_Diffusion
- ![Awesome - ml/awesome-efficient-llm-diffusion)
Benchmark
- BiBench
  - [Paper
- MQBench
  - [Paper
  - Ruihao Gong
Benchmarks
- [Paper
- [Paper - Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization?style=social)](https://github.com/Macaronlin/LLaMA3-Quantization)
- [Paper - ML/Qwen3-Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Efficient-ML/Qwen3-Quantization?style=social)](https://github.com/Efficient-ML/Qwen3-Quantization)
- [Paper
- [Paper
- [Paper - Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization?style=social)](https://github.com/Macaronlin/LLaMA3-Quantization)
- [Paper
- [Paper - Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization?style=social)](https://github.com/Macaronlin/LLaMA3-Quantization)
- [Paper
- [Paper - Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization?style=social)](https://github.com/Macaronlin/LLaMA3-Quantization)
- [Paper
Books
- 2015
  - Quantization and Fast Inference: A practitioner’s guide to efficient AI
Papers
- 2015
  - [ICML
  - [NeurIPS
  - [arXiv
- 2016
  - [CoRR - Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [**`qnn`**] [[code](https://github.com/tensorpack/tensorpack/tree/master/examples/DoReFa-Net)] [5.8k:star:]
  - [ECCV - Net: ImageNet Classification Using Binary Convolutional Neural Networks. [**`bnn`**] [[torch](https://github.com/allenai/XNOR-Net)] [787:star:]
  - [ICASSP - point Performance Analysis of Recurrent Neural Networks. [**`qnn`**]
  - [NeurIPS - chris/caffe-twns)] [61:star:]
  - [CVPR - wu/quantized-cnn)
  - [NeurIPS - 1. [**`bnn`**] [[torch](https://github.com/itayhubara/BinaryNet)] [239:star:]
- 2017
  - [CoRR - Source Binary Neural Network Implementation Based on MXNet. [**`bnn`**] [[code](https://github.com/hpi-xnor)]
  - [CVPR - wave Gaussian Quantization. [**`qnn`**] [[code](https://github.com/zhaoweicai/hwgq)] [118:star:]
  - [CVPR
  - [FPGA
  - [ICASSP - point optimization of deep neural networks with adaptive step size retraining. [**`qnn`**]
  - [ICCV - cnn-landmarks)] [[torch](https://github.com/1adrianb/binary-human-pose-estimation)] [207:star:]
  - [ICCV - Order Residual Quantization. [**`qnn`**]
  - [ICLR - Precision Weights. [**`qnn`**] [[torch](https://github.com/Mxbonn/INQ-pytorch)] [144:star:]
  - [ICLR - aware Binarization of Deep Networks. [**`bnn`**] [[code](https://github.com/houlu369/Loss-aware-Binarization)]
  - [ICLR - Sharing for Neural Network Compression. [__`other`__]
  - [ICLR - ternary-quantization)] [90:star:]
  - [InterSpeech
  - [IPDPSW - Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA. [**`hardware`**]
  - [JETC - Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. [**`hardware`**] [**`bnn`**]
  - [NeurIPS - Binary-Convolution-Network)]
  - [Neurocomputing - BNN: Binarized neural network on FPGA. [**`hardware`**]
  - [arXiv - Grained Quantization. [**`qnn`**]
  - [arXiv - Precision Architecture for Inference of Convolutional Neural Networks. [**`qnn`**] [[code](https://github.com/gudovskiy/ShiftCNN)] [53:star:]
  - [MWSCAS
  - [Neurocomputing - BNN: Binarized neural network on FPGA. [**`hardware`**]
  - [MWSCAS
- 2018
  - [AAAI
  - [AAAI
  - [CAAI
  - [CoRR
  - [CoRR
  - [CVPR - Step Quantization for Low-bit Neural Networks. [**`qnn`**]
  - [CVPR - bitwidth Weights and Activations. [**`qnn`**]
  - [CVPR - bitwidth Convolutional Neural Networks. [**`qnn`**]
  - [CVPR
  - [CVPR
  - [CVPR - Arithmetic-Only Inference. [**`qnn`**]
  - [ECCV - Binary Decomposition. [**`bnn`**]
  - [ECCV
  - [ECCV - Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks. [**`qnn`**] [[tensorflow](https://github.com/microsoft/LQ-Nets)] [188:star:]
  - [ECCV - Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. [**`bnn`**] [[torch](https://github.com/liuzechun/Bi-Real-net)] [120:star:]
  - [ECCV
  - [FCCM
  - [FPL
  - [ICLR - aware Weight Quantization of Deep Networks. [**`qnn`**] [[code](https://github.com/houlu369/Loss-aware-weight-quantization)]
  - [ICLR
  - [ICLR
  - [ICLR - Precision Networks. [**`qnn`**]
  - [ICLR
  - [ICLR - Precision Network Accuracy. [**`qnn`**]
  - [IJCAI
  - [IJCAI - -citation 14-->
  - [IJCNN
  - [IPDPS
  - [NCA - based accelerators for convolutional neural networks. [**`hardware`**]
  - [NeurIPS - bit Floating Point Numbers. [**`qnn`**]
  - [NeurIPS - bit training of neural networks. [**`qnn`**] [[torch](https://github.com/eladhoffer/quantized.pytorch)]
  - [Res Math Sci
  - [TCAD - fJ/op Binary Neural Network Inference. [**`hardware`**]
  - [TRETS - R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. [**`qnn`**]
  - [arXiv - xnor/BMXNet-v2)] [192:star:]
  - [arXiv - quantization)]
  - [CVPR - error-aware quantization for low-bit deep neural networks. [**`qnn`**]
  - [IEEE J. Solid-State Circuits - Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. [**`hardware`**] [**`qnn`**]
  - [TVLSI - Efficient Architecture for Binary Weight Convolutional Neural Networks. [**`bnn`**]
  - [CoRR
  - [FPL
  - [IEEE J. Solid-State Circuits - Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. [**`hardware`**] [**`qnn`**]
  - [NCA - based accelerators for convolutional neural networks. [**`hardware`**]
  - [MM - Time Low-Power Inference of Binary Neural Networks on CPUs. [**`bnn`**]
  - [TCAD - fJ/op Binary Neural Network Inference. [**`hardware`**]
  - [TRETS - R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. [**`qnn`**]
  - [TVLSI - Efficient Architecture for Binary Weight Convolutional Neural Networks. [**`bnn`**]
  - [FCCM
- 2019
  - [AAAI
  - [AAAI - bit CNNs via Discrete Back Propagation. [**`bnn`**]
  - [APCCAS
  - [BMVC - Net++: Improved Binary Neural Networks. [**`bnn`**]
  - [BMVC
  - [CoRR - bit DCNNs. [**`bnn`**]
  - [CoRR - Ensemble Template for Accurate Binary Convolutional Neural Networks. [**`bnn`**]
  - [CoRR
  - [CoRR
  - [CoRR
  - [CoRR - xnor/BMXNet-v2)] [193:star:]
  - [CVPR
  - [CVPR - Map Sparsity Through Low-Bit Quantization. [**`qnn`**]
  - [CVPR - Aware Automated Quantization with Mixed Precision. [**`qnn`**] [**`hardware`**] [[torch](https://github.com/mit-han-lab/haq)] [233:star:]
  - [CVPR - quantization-networks)] [82:star:]
  - [CVPR
  - [CVPR - Wise Interactions for Binary Convolutional Neural Networks. [**`bnn`**]
  - [CVPR - bit DCNNs with Circulant Back Propagation. [**`bnn`**]
  - [CVPR
  - [CVPR
  - [CVPR
  - [FPGA - Efficient Binarized Neural Network Inference on FPGA. [**`bnn`**] [**`hardware`**]
  - [ICCV - Precision and Low-Bit Neural Networks. [**`qnn`**]
  - [ICCV - bit cnns. [**`bnn`**]
  - [ICCV
  - [ICCV - Precision. [**`qnn`**]
  - [ICCV - Free Quantization Through Weight Equalization and Bias Correction. [**`qnn`**] [**`hardware`**] [[torch](https://github.com/jakc4103/DFQ)]
  - [ICCV
  - [ICML - Bit Quantization of Transformer Neural Machine Language Translation Model. [**`qnn`**] [**`nlp`**]
  - [ICLR
  - [ICLR
  - [ICIP - xnor/BMXNet-v2)] [192:star:]
  - [ICUS
  - [IJCAI - Efficient Hashing with Minimizing Quantization Loss. [**`bnn`**]
  - [IJCAI
  - [ISOCC
  - [IEEE J. Emerg. Sel. Topics Circuits Syst. - Chip Systolically Scalable Binary-Weight CNN Inference Engine. [**`hardware`**]
  - [IEEE JETC
  - [IEEE J. Solid-State Circuits - Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width. [**`qnn`**]
  - [MDPI Electronics
  - [NeurIPS - differentiable Quantization. [**`qnn`**] [[torch](https://github.com/csyhhu/MetaQuant)]
  - [NeurIPS - bnn-optimization)]
  - [NeurIPS
  - [NeurIPS
  - [NeurIPS
  - [NeurIPS
  - [NeurIPS
  - [RoEduNet
  - [SiPS
  - [TMM - Modal Hashing. [**`bnn`**]
  - [TMM
  - [IEEE TCS.I - RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays. [**`hardware`**]
  - [IEEE TCS.I - Chip Memory. [**`bnn`**]
  - [VLSI-SoC - Efficient Execution of Binary Neural Networks Using Resistive Memories. [**`bnn`**] [**`hardware`**]
  - [paper
  - [arXiv - Binarizing Networks. [**`bnn`**]
  - [arXiv
  - [arXiv - CV/dabnn)]
  - [arXiv - aware Knowledge Distillation. [**`qnn`**]
  - [arXiv
  - [AAAI
  - [AAAI - bit CNNs via Discrete Back Propagation. [**`bnn`**]
  - [APCCAS
  - [CoRR
  - [GLSVLSI
  - [ICCV - bit cnns. [**`bnn`**]
  - [ICUS
  - [IEEE J. Emerg. Sel. Topics Circuits Syst. - Chip Systolically Scalable Binary-Weight CNN Inference Engine. [**`hardware`**]
  - [IEEE JETC
  - [NeurIPS
  - [RoEduNet
  - [TMM
  - [VLSI-SoC - Efficient Execution of Binary Neural Networks Using Resistive Memories. [**`bnn`**] [**`hardware`**]
  - [arXiv - CV/dabnn)]
  - [MDPI Electronics
  - [CVPR
- 2020
  - [CVPR - Net)] [105:star:] 
  - [ACL - -citation 0-->
  - [AAAI
  - [AAAI - BERT: Hessian Based Ultra Low Precision Quantization of BERT. [__`qnn`__]
  - [AAAI - Inducing Binarized Neural Networks. [**`bnn`**]
  - [AAAI - Width Quantization with Multiple Phase Adaptations.
  - [COOL CHIPS - DRAM Accelerator Architecture for Binary Neural Network. [**`hardware`**] 
  - [CoRR
  - [CVPR - noah/ghostnet)] [1.2k:star:] 
  - [CVPR - han-lab/apq)] [76:star:]
  - [CVPR - Bit Face Recognition. [__`qnn`__]
  - [CVPR - -citation 3-->
  - [CVPR - Point Back-Propagation Training. [[video](https://www.youtube.com/watch?v=nVRNygIQKI0)] [__`qnn`__]
  - [CVPR - Bit Quantization Needs Good Distribution. [**`qnn`**] 
  - [ICML - bit quantization through learnable offsets and better initialization
  - [DATE - based computing systems. [**`bnn`**] 
  - [DATE - Accelerated Binary Neural Network Inference Engine for Mobile Phones. [**`bnn`**] [**`hardware`**]
  - [DATE - -citation 2-->
  - [ECCV - -citation 5-->
  - [ECCV - 4-bit MobileNet Models. [**`qnn`**] 
  - [ECCV - -citation 2-->
  - [ECCV - -citation 7-->
  - [ECCV - -citation 6-->
  - [ECCV - bitwidth Data Free Quantization. [**`qnn`**] [[torch](https://github.com/xushoukai/GDFQ)]
  - [EMNLP - aware Ultra-low Bit BERT. [**`qnn`**]
  - [EMNLP
  - [ICET - Efficient Bagged Binary Neural Network Accelerator. [**`bnn`**] [**`hardware`**] 
  - [ICASSP - -citation 3-->
  - [ICML - -citation 5-->
  - [ICML - Scale Inference with Anisotropic Vector Quantization.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-Model-Quantization

Awesome_Efficient_LLM_Diffusion

Benchmark

BiBench

MQBench

Benchmarks

Books

2015

Papers

2015

2016

2017

2018

2019

2020