An open API service indexing awesome lists of open source software.

Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
https://github.com/Efficient-ML/Awesome-Model-Quantization

Last synced: 4 days ago
JSON representation

  • Awesome_Efficient_LLM_Diffusion

    • ![Awesome - ml/awesome-efficient-llm-diffusion)
  • Benchmark

  • Benchmarks

    • [Paper
    • [Paper - Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization?style=social)](https://github.com/Macaronlin/LLaMA3-Quantization)
    • [Paper - ML/Qwen3-Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Efficient-ML/Qwen3-Quantization?style=social)](https://github.com/Efficient-ML/Qwen3-Quantization)
    • [Paper
    • [Paper
    • [Paper - Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization?style=social)](https://github.com/Macaronlin/LLaMA3-Quantization)
    • [Paper
    • [Paper - Quantization)] [![GitHub stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization?style=social)](https://github.com/Macaronlin/LLaMA3-Quantization)
    • [Paper
  • Books

  • Papers

    • 2015

    • 2016

      • [CoRR - Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [**`qnn`**] [[code](https://github.com/tensorpack/tensorpack/tree/master/examples/DoReFa-Net)] [5.8k:star:]
      • [ECCV - Net: ImageNet Classification Using Binary Convolutional Neural Networks. [**`bnn`**] [[torch](https://github.com/allenai/XNOR-Net)] [787:star:]
      • [ICASSP - point Performance Analysis of Recurrent Neural Networks. [**`qnn`**]
      • [NeurIPS - chris/caffe-twns)] [61:star:]
      • [CVPR - wu/quantized-cnn)
      • [NeurIPS - 1. [**`bnn`**] [[torch](https://github.com/itayhubara/BinaryNet)] [239:star:]
    • 2017

      • [CoRR - Source Binary Neural Network Implementation Based on MXNet. [**`bnn`**] [[code](https://github.com/hpi-xnor)]
      • [CVPR - wave Gaussian Quantization. [**`qnn`**] [[code](https://github.com/zhaoweicai/hwgq)] [118:star:]
      • [CVPR
      • [FPGA
      • [ICASSP - point optimization of deep neural networks with adaptive step size retraining. [**`qnn`**]
      • [ICCV - cnn-landmarks)] [[torch](https://github.com/1adrianb/binary-human-pose-estimation)] [207:star:]
      • [ICCV - Order Residual Quantization. [**`qnn`**]
      • [ICLR - Precision Weights. [**`qnn`**] [[torch](https://github.com/Mxbonn/INQ-pytorch)] [144:star:]
      • [ICLR - aware Binarization of Deep Networks. [**`bnn`**] [[code](https://github.com/houlu369/Loss-aware-Binarization)]
      • [ICLR - Sharing for Neural Network Compression. [__`other`__]
      • [ICLR - ternary-quantization)] [90:star:]
      • [InterSpeech
      • [IPDPSW - Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA. [**`hardware`**]
      • [JETC - Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. [**`hardware`**] [**`bnn`**]
      • [NeurIPS - Binary-Convolution-Network)]
      • [Neurocomputing - BNN: Binarized neural network on FPGA. [**`hardware`**]
      • [arXiv - Grained Quantization. [**`qnn`**]
      • [arXiv - Precision Architecture for Inference of Convolutional Neural Networks. [**`qnn`**] [[code](https://github.com/gudovskiy/ShiftCNN)] [53:star:]
      • [MWSCAS
      • [Neurocomputing - BNN: Binarized neural network on FPGA. [**`hardware`**]
      • [MWSCAS
    • 2018

      • [AAAI
      • [AAAI
      • [CAAI
      • [CoRR
      • [CoRR
      • [CVPR - Step Quantization for Low-bit Neural Networks. [**`qnn`**]
      • [CVPR - bitwidth Weights and Activations. [**`qnn`**]
      • [CVPR - bitwidth Convolutional Neural Networks. [**`qnn`**]
      • [CVPR
      • [CVPR
      • [CVPR - Arithmetic-Only Inference. [**`qnn`**]
      • [ECCV - Binary Decomposition. [**`bnn`**]
      • [ECCV
      • [ECCV - Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks. [**`qnn`**] [[tensorflow](https://github.com/microsoft/LQ-Nets)] [188:star:]
      • [ECCV - Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. [**`bnn`**] [[torch](https://github.com/liuzechun/Bi-Real-net)] [120:star:]
      • [ECCV
      • [FCCM
      • [FPL
      • [ICLR - aware Weight Quantization of Deep Networks. [**`qnn`**] [[code](https://github.com/houlu369/Loss-aware-weight-quantization)]
      • [ICLR
      • [ICLR
      • [ICLR - Precision Networks. [**`qnn`**]
      • [ICLR
      • [ICLR - Precision Network Accuracy. [**`qnn`**]
      • [IJCAI
      • [IJCAI - -citation 14-->
      • [IJCNN
      • [IPDPS
      • [NCA - based accelerators for convolutional neural networks. [**`hardware`**]
      • [NeurIPS - bit Floating Point Numbers. [**`qnn`**]
      • [NeurIPS - bit training of neural networks. [**`qnn`**] [[torch](https://github.com/eladhoffer/quantized.pytorch)]
      • [Res Math Sci
      • [TCAD - fJ/op Binary Neural Network Inference. [**`hardware`**]
      • [TRETS - R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. [**`qnn`**]
      • [arXiv - xnor/BMXNet-v2)] [192:star:]
      • [arXiv - quantization)]
      • [CVPR - error-aware quantization for low-bit deep neural networks. [**`qnn`**]
      • [IEEE J. Solid-State Circuits - Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. [**`hardware`**] [**`qnn`**]
      • [TVLSI - Efficient Architecture for Binary Weight Convolutional Neural Networks. [**`bnn`**]
      • [CoRR
      • [FPL
      • [IEEE J. Solid-State Circuits - Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. [**`hardware`**] [**`qnn`**]
      • [NCA - based accelerators for convolutional neural networks. [**`hardware`**]
      • [MM - Time Low-Power Inference of Binary Neural Networks on CPUs. [**`bnn`**]
      • [TCAD - fJ/op Binary Neural Network Inference. [**`hardware`**]
      • [TRETS - R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. [**`qnn`**]
      • [TVLSI - Efficient Architecture for Binary Weight Convolutional Neural Networks. [**`bnn`**]
      • [FCCM
    • 2019

    • 2020

      • [CVPR - Net)] [105:star:] <!--citation 15-->
      • [ACL - -citation 0-->
      • [AAAI
      • [AAAI - BERT: Hessian Based Ultra Low Precision Quantization of BERT. [__`qnn`__]
      • [AAAI - Inducing Binarized Neural Networks. [**`bnn`**]
      • [AAAI - Width Quantization with Multiple Phase Adaptations.
      • [COOL CHIPS - DRAM Accelerator Architecture for Binary Neural Network. [**`hardware`**] <!--citation 0-->
      • [CoRR
      • [CVPR - noah/ghostnet)] [1.2k:star:] <!--citation 47-->
      • [CVPR - han-lab/apq)] [76:star:]
      • [CVPR - Bit Face Recognition. [__`qnn`__]
      • [CVPR - -citation 3-->
      • [CVPR - Point Back-Propagation Training. [[video](https://www.youtube.com/watch?v=nVRNygIQKI0)] [__`qnn`__]
      • [CVPR - Bit Quantization Needs Good Distribution. [**`qnn`**] <!--citation 1-->
      • [ICML - bit quantization through learnable offsets and better initialization
      • [DATE - based computing systems. [**`bnn`**] <!--citation 1-->
      • [DATE - Accelerated Binary Neural Network Inference Engine for Mobile Phones. [**`bnn`**] [**`hardware`**]
      • [DATE - -citation 2-->
      • [ECCV - -citation 5-->
      • [ECCV - 4-bit MobileNet Models. [**`qnn`**] <!--citation 2-->
      • [ECCV - -citation 2-->
      • [ECCV - -citation 7-->
      • [ECCV - -citation 6-->
      • [ECCV - bitwidth Data Free Quantization. [**`qnn`**] [[torch](https://github.com/xushoukai/GDFQ)]
      • [EMNLP - aware Ultra-low Bit BERT. [**`qnn`**]
      • [EMNLP
      • [ICET - Efficient Bagged Binary Neural Network Accelerator. [**`bnn`**] [**`hardware`**] <!--citation 0-->
      • [ICASSP - -citation 3-->
      • [ICML - -citation 5-->
      • [ICML - Scale Inference with Anisotropic Vector Quantization.
      • [ICLR
      • [ICLR - to-Binary Convolutions. [**`bnn`**] [[code is comming](https://github.com/brais-martinez/real2binary)] [[re-implement](https://github.com/larq/zoo/blob/master/larq_zoo/literature/real_to_bin_nets.py)] <!--citation 19-->