An open API service indexing awesome lists of open source software.

Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
https://github.com/Efficient-ML/Awesome-Model-Quantization

Last synced: 3 days ago
JSON representation

  • Papers

    • 2020

      • [ICML - Scale Inference with Anisotropic Vector Quantization.
      • [CVPR - han-lab/apq)] [76:star:]
      • [ECCV - -citation 6-->
      • [CVPR - Net)] [105:star:] <!--citation 15-->
      • [ACL - -citation 0-->
      • [AAAI
      • [AAAI - BERT: Hessian Based Ultra Low Precision Quantization of BERT. [__`qnn`__]
      • [AAAI - Inducing Binarized Neural Networks. [**`bnn`**]
      • [AAAI - Width Quantization with Multiple Phase Adaptations.
      • [COOL CHIPS - DRAM Accelerator Architecture for Binary Neural Network. [**`hardware`**] <!--citation 0-->
      • [CoRR
      • [CVPR - noah/ghostnet)] [1.2k:star:] <!--citation 47-->
      • [CVPR - Bit Face Recognition. [__`qnn`__]
      • [CVPR - -citation 3-->
      • [CVPR - Point Back-Propagation Training. [[video](https://www.youtube.com/watch?v=nVRNygIQKI0)] [__`qnn`__]
      • [CVPR - Bit Quantization Needs Good Distribution. [**`qnn`**] <!--citation 1-->
      • [ICML - bit quantization through learnable offsets and better initialization
      • [DATE - based computing systems. [**`bnn`**] <!--citation 1-->
      • [DATE - Accelerated Binary Neural Network Inference Engine for Mobile Phones. [**`bnn`**] [**`hardware`**]
      • [DATE - -citation 2-->
      • [ECCV - -citation 5-->
      • [ECCV - 4-bit MobileNet Models. [**`qnn`**] <!--citation 2-->
      • [ECCV - -citation 2-->
      • [ECCV - -citation 7-->
      • [ECCV - bitwidth Data Free Quantization. [**`qnn`**] [[torch](https://github.com/xushoukai/GDFQ)]
      • [EMNLP - aware Ultra-low Bit BERT. [**`qnn`**]
      • [EMNLP
      • [ICET - Efficient Bagged Binary Neural Network Accelerator. [**`bnn`**] [**`hardware`**] <!--citation 0-->
      • [ICASSP - -citation 3-->
      • [ICML - -citation 5-->
      • [ICLR
      • [ICLR - to-Binary Convolutions. [**`bnn`**] [[code is comming](https://github.com/brais-martinez/real2binary)] [[re-implement](https://github.com/larq/zoo/blob/master/larq_zoo/literature/real_to_bin_nets.py)] <!--citation 19-->
      • [ICLR - K1m/BinaryDuo)] <!--citation 6-->
      • [ICLR - research-code/tree/master/mixed-precision-dnns)] [73:star:]
      • [ICLR
      • [IJCV - -citation 0-->
      • [IJCAI - NAS: Child-Parent Neural Architecture Search for Binary Neural Networks. [**`bnn`**]
      • [IJCAI - bit Integer Inference for the Transformer Model. [**`qnn`**] [**`nlp`**]
      • [IJCAI
      • [IJCAI - bit Multiply-Accumulate Operations. [**`qnn`**]
      • [IJCAI - width Deep Neural Networks. [**`qnn`**]
      • [IJCAI
      • [ISCAS - Level Binarized Recurrent Neural Network for EEG Signal Classification. [**`bnn`**] <!--citation 0-->
      • [ISQED - ASU/BNNPruning)] <!--citation 0-->
      • [MICRO - Based NLP Models for Low Latency and Energy Efficient Inference. [**`qnn`**] [**`nlp`**]
      • [MLST - -citation 11-->
      • [NeurIPS
      • [NeurIPS - Bit Weights in Quantized Neural Networks. [**`qnn`**] [[torch](https://github.com/zhaohui-yang/Binary-Neural-Networks/tree/main/SLB)] <!--citation 4-->
      • [NeurIPS
      • [NeurIPS - kai/eevbnn)]
      • [NeurIPS - Analytic Gradient Estimators for Stochastic Binary Networks. [**`bnn`**] [[code](https://github.com/shekhovt/PSA-Neurips2020)]
      • [NeurIPS - V2: Hessian Aware trace-Weighted Quantization of Neural Networks. [**`qnn`**]
      • [NeurIPS
      • [NeurIPS
      • [NeurIPS - Layer Flow. [**`qnn`**] [[torch](https://github.com/didriknielsen/pixelcnn_flow)]
      • [NeurIPS - Parallel SGD. [**`qnn`**] [[torch](https://github.com/tabrizian/learning-to-quantize)]
      • [NeurIPS
      • [NeurIPS - based Scaled Gradient for Model Quantization and Pruning. [**`qnn`**] [[torch](https://github.com/Jangho-Kim/PSG-pytorch)]
      • [NN - performance and large-scale deep neural networks with full 8-bit integers. [**`qnn`**] <!--citation 13-->
      • [Neurocomputing
      • [PR Letters - -citation 0-->
      • [SysML - to-End Binarized Neural Networks. [**`qnn`**] [[tensorflow](https://github.com/jwfromm/Riptide)] [129:star:] <!--citation 5-->
      • [TPAMI - cnn-landmarks)] [[code](https://github.com/1adrianb/binary-human-pose-estimation)]
      • [TPAMI - Nets: A Coupled and Quantized Approach.
      • [TVLSI - Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks. [**`qnn`**] <!--citation 0-->
      • [WACV - -citation 11-->
      • [IEEE Access - Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network. [**`bnn`**] [**`hardware`**] <!--citation 0-->
      • [IEEE Trans. Magn - Memory Binary Neural Network Accelerator. [**`bnn`**] <!--citation 0-->
      • [IEEE TCS.II - Efficient Inference Accelerator for Binary Convolutional Neural Networks. [**`hardware`**] <!--citation 1-->
      • [IEEE TCS.I - Memory Multi-Bit Multiplication and ACcumulation in 6T SRAM Array. [**`qnn`**] <!--citation 3-->
      • [IEEE Trans. Electron Devices - -citation 0-->
      • [arXiv
      • [arXiv - -citation 0-->
      • [arXiv - -citation 5-->
      • [arXiv - -citation 1-->
      • [arXiv - Tensor-Cores in Turing GPUs. [**`bnn`**] [[code](https://github.com/pnnl/TCBNN)] <!--citation 1-->
      • [arXiv - level Accuracy? [**`bnn`**] [[code](https://github.com/hpi-xnor/BMXNet-v2)] [192:star:] <!--citation 13-->
      • [arXiv - -citation 3-->
      • [paper - -citation 2-->
      • [arXiv - -citation 0-->
      • [arXiv
      • [ECCV
      • [COOL CHIPS - DRAM Accelerator Architecture for Binary Neural Network. [**`hardware`**] <!--citation 0-->
      • [IJCAI - NAS: Child-Parent Neural Architecture Search for Binary Neural Networks. [**`bnn`**]
      • [ISCAS - Level Binarized Recurrent Neural Network for EEG Signal Classification. [**`bnn`**] <!--citation 0-->
      • [TPAMI - cnn-landmarks)] [[code](https://github.com/1adrianb/binary-human-pose-estimation)]
      • [TPAMI - Parallel Pruning-Quantization.
      • [TPAMI - Nets: A Coupled and Quantized Approach.
      • [TVLSI - Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks. [**`qnn`**] <!--citation 0-->
      • [IEEE Access - Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network. [**`bnn`**] [**`hardware`**] <!--citation 0-->
      • [IEEE TCS.II - Efficient Inference Accelerator for Binary Convolutional Neural Networks. [**`hardware`**] <!--citation 1-->
      • [IEEE TCS.I - Memory Multi-Bit Multiplication and ACcumulation in 6T SRAM Array. [**`qnn`**] <!--citation 3-->
      • [arXiv - -citation 5-->
      • [arXiv - -citation 1-->
      • [arXiv - -citation 3-->
      • [ECCV
      • [arXiv - -citation 0-->
      • [MICRO - Based NLP Models for Low Latency and Energy Efficient Inference. [**`qnn`**] [**`nlp`**]
      • [ACL - -citation 0-->
    • 2016

      • [ICASSP - point Performance Analysis of Recurrent Neural Networks. [**`qnn`**]
      • [ECCV - Net: ImageNet Classification Using Binary Convolutional Neural Networks. [**`bnn`**] [[torch](https://github.com/allenai/XNOR-Net)] [787:star:]
      • [CoRR - Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [**`qnn`**] [[code](https://github.com/tensorpack/tensorpack/tree/master/examples/DoReFa-Net)] [5.8k:star:]
      • [NeurIPS - chris/caffe-twns)] [61:star:]
      • [CVPR - wu/quantized-cnn)
      • [NeurIPS - 1. [**`bnn`**] [[torch](https://github.com/itayhubara/BinaryNet)] [239:star:]
    • 2017

      • [arXiv - Precision Architecture for Inference of Convolutional Neural Networks. [**`qnn`**] [[code](https://github.com/gudovskiy/ShiftCNN)] [53:star:]
      • [CoRR - Source Binary Neural Network Implementation Based on MXNet. [**`bnn`**] [[code](https://github.com/hpi-xnor)]
      • [CVPR - wave Gaussian Quantization. [**`qnn`**] [[code](https://github.com/zhaoweicai/hwgq)] [118:star:]
      • [CVPR
      • [FPGA
      • [ICASSP - point optimization of deep neural networks with adaptive step size retraining. [**`qnn`**]
      • [ICCV - cnn-landmarks)] [[torch](https://github.com/1adrianb/binary-human-pose-estimation)] [207:star:]
      • [ICCV - Order Residual Quantization. [**`qnn`**]
      • [ICLR - Precision Weights. [**`qnn`**] [[torch](https://github.com/Mxbonn/INQ-pytorch)] [144:star:]
      • [ICLR - aware Binarization of Deep Networks. [**`bnn`**] [[code](https://github.com/houlu369/Loss-aware-Binarization)]
      • [ICLR - Sharing for Neural Network Compression. [__`other`__]
      • [ICLR - ternary-quantization)] [90:star:]
      • [InterSpeech
      • [IPDPSW - Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA. [**`hardware`**]
      • [JETC - Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. [**`hardware`**] [**`bnn`**]
      • [NeurIPS - Binary-Convolution-Network)]
      • [Neurocomputing - BNN: Binarized neural network on FPGA. [**`hardware`**]
      • [arXiv - Grained Quantization. [**`qnn`**]
      • [MWSCAS
      • [Neurocomputing - BNN: Binarized neural network on FPGA. [**`hardware`**]
      • [MWSCAS
    • 2018

      • [ICLR - Precision Network Accuracy. [**`qnn`**]
      • [AAAI
      • [AAAI
      • [CAAI
      • [CoRR
      • [CoRR
      • [CVPR - Step Quantization for Low-bit Neural Networks. [**`qnn`**]
      • [CVPR - bitwidth Weights and Activations. [**`qnn`**]
      • [CVPR - bitwidth Convolutional Neural Networks. [**`qnn`**]
      • [CVPR
      • [CVPR
      • [CVPR - Arithmetic-Only Inference. [**`qnn`**]
      • [ECCV - Binary Decomposition. [**`bnn`**]
      • [ECCV
      • [ECCV - Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks. [**`qnn`**] [[tensorflow](https://github.com/microsoft/LQ-Nets)] [188:star:]
      • [ECCV - Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. [**`bnn`**] [[torch](https://github.com/liuzechun/Bi-Real-net)] [120:star:]
      • [ECCV
      • [FCCM
      • [FPL
      • [ICLR - aware Weight Quantization of Deep Networks. [**`qnn`**] [[code](https://github.com/houlu369/Loss-aware-weight-quantization)]
      • [ICLR
      • [ICLR
      • [ICLR - Precision Networks. [**`qnn`**]
      • [ICLR
      • [IJCAI
      • [IJCAI - -citation 14-->
      • [IJCNN
      • [IPDPS
      • [NCA - based accelerators for convolutional neural networks. [**`hardware`**]
      • [NeurIPS - bit Floating Point Numbers. [**`qnn`**]
      • [NeurIPS - bit training of neural networks. [**`qnn`**] [[torch](https://github.com/eladhoffer/quantized.pytorch)]
      • [Res Math Sci
      • [TCAD - fJ/op Binary Neural Network Inference. [**`hardware`**]
      • [TRETS - R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. [**`qnn`**]
      • [arXiv - xnor/BMXNet-v2)] [192:star:]
      • [arXiv - quantization)]
      • [CVPR - error-aware quantization for low-bit deep neural networks. [**`qnn`**]
      • [IEEE J. Solid-State Circuits - Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. [**`hardware`**] [**`qnn`**]
      • [TVLSI - Efficient Architecture for Binary Weight Convolutional Neural Networks. [**`bnn`**]
      • [CoRR
      • [FPL
      • [IEEE J. Solid-State Circuits - Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. [**`hardware`**] [**`qnn`**]
      • [NCA - based accelerators for convolutional neural networks. [**`hardware`**]
      • [MM - Time Low-Power Inference of Binary Neural Networks on CPUs. [**`bnn`**]
      • [TCAD - fJ/op Binary Neural Network Inference. [**`hardware`**]
      • [TRETS - R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. [**`qnn`**]
      • [TVLSI - Efficient Architecture for Binary Weight Convolutional Neural Networks. [**`bnn`**]
      • [FCCM
    • 2021

      • [arXiv - Training Quantization for Vision Transformer. [**`qnn`**]
      • [ICLR - shot learning via vector quantization in deep embedded space. [__`qnn`__]
      • [CVPR - tune: Efficient Compression of Neural Networks. [__`qnn`__] [[torch](https://github.com/uber-research/permute-quantize-finetune)] [137⭐]
      • [ICLR - Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network. [**`bnn`**]
      • [ICLR
      • [ICML
      • [ICML - Bit Activation Compressed Training [**`qnn`**]
      • [ICML - V3: Dyadic Neural Network Quantization. [**`qnn`**]
      • [ICML - BERT: Integer-only BERT Quantization. [**`qnn`**]
      • [ICML
      • [ICML - NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators. [**`qnn`**]
      • [ICCV - Free Compression Are Feature and Data Mixing.
      • [CVPR - bnn: Bridging the gap between self-supervised real and 1-bit neural networks via guided distribution calibration [**`bnn`**] [[code](https://github.com/szq0214/S2-BNN)] [52⭐]
      • [CVPR - Free Quantization. [__`qnn`__]
      • [ACM MM - Resolution Networks. [**`qnn`**]
      • [NeurIPS - free Quantization with Synthetic Boundary Supporting Samples. [__`qnn`__]
      • [NeurIPS - Training Quantization for Vision Transformer. [__`mixed`__]
      • [NeurIPS - Training Sparsity-Aware Quantization. [__`qnn`__]
      • [NeurIPS
      • [NeurIPS - GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization. [__`other`__]
      • [NeurIPS - ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes .
      • [NeurIPS - of-Distribution Robustness. [**`bnn`**] [[torch](https://github.com/chrundle/biprop)]
      • [CVPR - bit Neural Networks. [__`qnn`__]
      • [CVPR - shot Adversarial Quantization. [__`qnn`__] [[torch](https://github.com/FLHonker/ZAQ-code)]
      • [CVPR
      • [CVPR - wise Gradient Scaling. [__`qnn`__] [[torch](https://github.com/cvlab-yonsei/EWGS)]
      • [CVPR
      • [ICLR
      • [ICLR - Capacity Expert Binary Networks. [**`bnn`**]
      • [ICLR - Training Quantization by Block Reconstruction. [__`qnn`__] [[torch](https://github.com/yhhhli/BRECQ)]
      • [ICLR - lognormal: improved quantized and sparse training. [__`qnn`__]
      • [ICLR
      • [ICLR - Quant: Quantization-Aware Training for Graph Neural Networks. [__`qnn`__]
      • [ICLR - Level Sparsity for Mixed-Precision Neural Network Quantization. [__`qnn`__]
      • [ICLR
      • [ICLR
      • [ICLR - Low-Resolution Arithmetic. [__`qnn`__]
      • [ECCV - Resolution via Parameterized Max Scale. [__`qnn`__]
      • [AAAI
      • [AAAI
      • [AAAI - Precision Activation Quantization. [__`qnn`__]
      • [AAAI - shot Pruning-Quantization. [__`qnn`__]
      • [AAAI
      • [AAAI
      • [AAAI - Widths. [__`qnn`__]
      • [AAAI - ­‐training Quantization with Multiple Points: Mixed Precision without Mixed Precision. [__`qnn`__]
      • [AAAI
      • [AAAI
      • [AAAI - ­Dimensional Binary Convolution Filters. [**`bnn`**]
      • [ACL - time Quantization of Attention Values in Transformers. [__`qnn`__]
      • [arXiv - Precision Deep Neural Networks. [__`mixed`__] [[torch](https://github.com/SHI-Labs/Any-Precision-DNNs)]
      • [arXiv - hXu/ReCU)]
      • [arXiv
      • [arXiv
      • [ACM MM - hops Graph Reasoning for Explicit Representation Learning. [__`other`__]
      • [AAAI
      • [AAAI - ­‐training Quantization with Multiple Points: Mixed Precision without Mixed Precision. [__`qnn`__]
      • [AAAI
      • [AAAI - Efficient Kernel SVM via Binary Embedding and Ternary Coefficients. [**`bnn`**]
      • [AAAI - ­Dimensional Binary Convolution Filters. [**`bnn`**]
      • [ACL - time Quantization of Attention Values in Transformers. [__`qnn`__]
      • [ICML
      • [ICLR - Low-Resolution Arithmetic. [__`qnn`__]
      • [AAAI - Precision Activation Quantization. [__`qnn`__]
      • [AAAI
      • [arXiv - hXu/ReCU)]
      • [arXiv
    • 2022

      • [IJCAI - ViT: Post-Training Quantization for Fully Quantized Vision Transformer. [__`qnn`__] [[code](https://github.com/megvii-research/FQ-ViT)] [71:star:]
      • [NeurIPS - bit Matrix Multiplication for Transformers at Scale
      • [NeurIPS - bit Transformer Language Models. [[code](https://github.com/wimh966/outlier_suppression)]
      • [arXiv - Training Quantization for Large Language Models [__`qnn`__] [[code](https://github.com/mit-han-lab/smoothquant)] [150:star:]
      • [ECCV
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [IJCV - sensitive Information Retention for Accurate Binary Neural Network. [__`bnn`__]
      • [ICML
      • [ICML - Optimal Low-Bit Sub-Distribution in Deep Neural Networks [**`qnn`**] [**`hardware`**]
      • [ICML
      • [ICLR
      • [CVPR - SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks. [**`snn`**]
      • [CVPR - Shot Quantization Brought Closer to the Teacher. [**`qnn`**] [[code](https://github.com/iamkanghyunchoi/ait)]
      • [CVPR - to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. [__`qnn`__]
      • [CVPR
      • [CVPR - Training Non-Uniform Quantization based on Minimizing the Reconstruction Error. [__`qnn`__]
      • [CVPR - Free Network Compression via Parametric Non-uniform Mixed Precision Quantization. [__`qnn`__]
      • [CVPR - Aware Dynamic Neural Network Quantization. [__`qnn`__]
      • [NeurIPS - distilled Transformer. [**`bnn`**] [[code](https://github.com/facebookresearch/bit)] [42⭐]
      • [NeurIPS - Layer Dependency for Post -Training Quantization. [__`qnn`__]
      • [NeurIPS - Aware Quantization Techniques. [__`qnn`__]
      • [NeurIPS - Driven Mixed-Precision Quantization for Deep Network Design. [__`qnn`__]
      • [NeurIPS
      • [NeurIPS
      • [NeurIPS - training Quantization of Pre-trained Language Models. [__`qnn`__]
      • [NeurIPS - Training Quantization and Pruning. [__`qnn`__] [**`hardware`**]
      • [NeurIPS - Training Quantization for Large-Scale Transformers. [__`qnn`__]
      • [NeurIPS
      • [NeurIPS - ViT: Accurate and Fully Quantized Low-bit Vision Transformer. [__`qnn`__]
      • [NeurIPS - Layer Perceptrons. [**`bnn`**] [[code](https://gitee.com/mindspore/models/tree/master/research/cv/BiMLP)]
      • [ECCV - Uniform Step Size Quantization for Accurate Post-Training Quantization. [__`qnn`__]
      • [ECCV - Training Quantization for Vision Transformers with Twin Uniform Quantization. [__`qnn`__]
      • [ECCV
      • [ECCV - wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks. [__`qnn`__]
      • [ECCV - Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization. [__`qnn`__]
      • [ECCV - Precision Neural Network Quantization via Learned Layer-Wise Importance. [__`qnn`__] [[Code](https://github.com/1hunters/LIMPQ)]
      • [ECCV
      • [ECCV - Free Quantization for Vision Transformers. [__`qnn`__]
      • [IJCAI
      • [IJCAI - of-Two Low-bit Post-training Quantization. [__`qnn`__]
      • [IJCAI - bit Quantization of Neural Networks. [__`qnn`__]
      • [ICLR - Point 8-bit Only Multiplication for Network Quantization. [**`qnn`**]
      • [ICLR - bit Optimizers via Block-wise Quantization. [**`qnn`**]
      • [ICLR - Precision Training: Data Format Optimization and Hysteresis Quantization. [**`qnn`**]
      • [ICLR
      • [ICLR - SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks. [**`snn`**]
      • [ICLR - the-Fly Data-Free Quantization via Diagonal Hessian Approximation. [**`qnn`**][code](https://github.com/clevercool/SQuant)]
      • [ICLR
      • [arXiv - ViT: Fully Differentiable Quantization for Vision Transformer [__`qnn`__]
      • [arXiv - training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment [__`qnn`__]
      • [TGARS - Based Hyperspectral Image Classification by Step Activation Quantization [__`qnn`__]
      • [arxiv
      • [IJNS
      • [ACM Trans. Des. Autom. Electron. Syst.
      • [MICRO - bit Deep Neural Network Quantization.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [CVPR - System-Software-and-Security/BppAttack)]
      • [IEEE Internet of Things Journal - Efficient Federated Learning Framework for IoT With Low-Bitwidth Neural Network Quantization.
      • [Neural Networks - aware training for low precision photonic neural networks.
      • [ICCRD
      • [Electronics
      • [Applied Soft Computing - distillation and parameter quantization for the bearing fault diagnosis.
      • [CVPR - Class Heterogeneity for Zero-Shot Network Quantization. [[torch](https://github.com/zysxmu/IntraQ)]
      • [Neurocomputing
      • [tinyML Research Symposium - of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks.
      • [arXiv - 8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition.
      • [Ocean Engineering
      • [CVPR - of-Flight Depth Maps.
      • [TCSVT - Q Quantization on FPGA.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [arXiv - Precision Quantized Neural Networks.
      • [arXiv
      • [ITSM - Powered Parking Surveillance With Quantized Neural Networks.
      • [Intelligent Automation & Soft Computing - Efficient Convolutional Neural Network Accelerator Using Fine-Grained Logarithmic Quantization.
      • [ICML - Aware Training. [[torch](https://github.com/qualcomm-ai-research/oscillations-qat)]
      • [CCF Transactions on High Performance Computing
      • [CVPR
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [TCCN - Bitwidth Convolutional Neural Networks for Wireless Interference Identification.
      • [ICPR - Wise Data-Free CNN Compression.
      • [IJCNN - Based Quantized Neural Networks.
      • [ACL - trained Language Models via Quantization
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [TODAES - in-Memory Neural Networks Acceleration.
      • [FPGA - QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [PPoPP
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [arXiv
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ASE - based Formal Verification Approach for Quantized Neural Networks.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [arXiv - Training Quantization for Large Language Models [__`qnn`__] [[code](https://github.com/mit-han-lab/smoothquant)] [150:star:]
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [arXiv - ViT: Fully Differentiable Quantization for Vision Transformer [__`qnn`__]
      • [arXiv - training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment [__`qnn`__]
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [tinyML Research Symposium - of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [CVPR - System-Software-and-Security/BppAttack)]
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [Neurocomputing
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [IJNS
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ACL - trained Language Models via Quantization
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [Neural Networks - aware training for low precision photonic neural networks.
      • [Applied Soft Computing - distillation and parameter quantization for the bearing fault diagnosis.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [arXiv - 8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
      • [ECCV - Computing-Lab-Yale/NDA_SNN)
      • [ESE - Based Software Testing approach for Deep Neural Network Quantization assessment.
      • [EANN - Aware Training Method for Photonic Neural Networks.
      • [CCF Transactions on High Performance Computing
      • [LNAI - Driven Quantization for Low-Bit and Sparse DNNs.
    • 2023

    • 2019

    • 2015

    • 2024

      • [arXiv
      • [arXiv - Efficient Tuning of Quantized Large Language Models
      • [arXiv - LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
      • [arXiv
      • [ICLR
      • [arXiv - Aware Training on Large Language Models via LoRA-wise LSQ
      • [arXiv - Bit Quantized Large Language Model
      • [arXiv - Aware Training for the Acceleration of Lightweight LLMs on the Edge [[code](https://github.com/shawnricecake/EdgeQAT)] ![GitHub Repo stars](https://img.shields.io/github/stars/shawnricecake/EdgeQAT)
      • [arXiv - Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
      • [arXiv - Rank Quantization Error Reconstruction for LLMs
      • [arXiv - Free Asymmetric 2bit Quantization for KV Cache [[code](https://github.com/jy-yuan/KIVI)] ![GitHub Repo stars](https://img.shields.io/github/stars/jy-yuan/KIVI)
      • [arXiv - Training Quantization for LLMs [[code](https://github.com/Aaronhuang-778/BiLLM)]![GitHub Repo stars](https://img.shields.io/github/stars/Aaronhuang-778/BiLLM)
      • [arXiv - RelaxML/quip-sharp)] ![GitHub Repo stars](https://img.shields.io/github/stars/Cornell-RelaxML/quip-sharp)
      • [arXiv - Aware Dequantization
      • [arXiv - Finetuning Quantization of LLMs via Information Retention [[code](https://github.com/htqin/IR-QLoRA)]![GitHub Repo stars](https://img.shields.io/github/stars/htqin/IR-QLoRA)
      • [arXiv - 4-Bit LLMs via Self-Distillation [[code](https://github.com/DD-DuDa/BitDistiller)] ![GitHub Repo stars](https://img.shields.io/github/stars/DD-DuDa/BitDistiller)
      • [arXiv - bit Large Language Models
      • [arXiv - LLM: Accurate Dual-Binarization for Efficient LLMs
      • [arXiv
      • [DAC
      • [arXiv - Aware Mixed Precision Quantization
      • [arXiv - bound for Large Language Models with Per-tensor Quantization
      • [arXiv - ai-research/gptvq)] ![GitHub Repo stars](https://img.shields.io/github/stars/qualcomm-ai-research/gptvq)
      • [DAC - aware Post-Training Mixed-Precision Quantization for Large Language Models
      • [arXiv
      • [arXiv - PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
      • [arXiv
      • [arXiv
      • [arXiv - free Quantization Algorithm for LLMs
      • [arXiv - KVCacheQuantization)] ![GitHub Repo stars](https://img.shields.io/github/stars/ClubieDong/QAQ-KVCacheQuantization)
      • [arXiv
      • [arXiv - Lossless Generative Inference of LLM
      • [arXiv - LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression [[code](https://github.com/AIoT-MLSys-Lab/SVD-LLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/AIoT-MLSys-Lab/SVD-LLM)
      • [ICLR Practical ML for Low Resource Settings Workshop
      • [arXiv
      • [arXiv
      • [arXiv - Free 4-Bit Inference in Rotated LLMs [[code](https://github.com/spcl/QuaRot)] ![GitHub Repo stars](https://img.shields.io/github/stars/spcl/QuaRot)
      • [arXiv - compensation)] ![GitHub Repo stars](https://img.shields.io/github/stars/GongCheng1919/bias-compensation)
      • [arXiv - Zheng/BinaryDM)]![GitHub Repo stars](https://img.shields.io/github/stars/Xingyu-Zheng/BinaryDM)
      • [arXiv - chip Hardware-aware Quantization
      • [arXiv - bit Quantized LLaMA3 Models? An Empirical Study [[code](https://github.com/Macaronlin/LLaMA3-Quantization)]![GitHub Repo stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization) [[HuggingFace](https://huggingface.co/LLMQ)]
      • [arXiv - Training Quantization with Low-precision Minifloats and Integers on FPGAs [[code](https://github.com/Xilinx/brevitas/tree/dev/src/brevitas_examples/imagenet_classification/ptq)][__`hardware`__]
      • [TMLR - Sparsity Trade-Off [[code](https://github.com/sachitkuhar/PLUM)][[webpage](https://github.com/sachitkuhar/PLUM)][[video](https://www.youtube.com/watch?v=nE_CYDWqQ_I)][**`bnn`**] [**`inference`**]
  • Awesome_Efficient_LLM_Diffusion

    • ![Awesome - ml/awesome-efficient-llm-diffusion)
  • Benchmark

  • Survey_Papers

  • Star History