Projects in Awesome Lists tagged with post-training-quantization
A curated list of projects in awesome lists tagged with post-training-quantization .
https://intel.github.io/neural-compressor/
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
auto-tuning awq fp4 gptq int4 int8 knowledge-distillation large-language-models low-precision mxformat post-training-quantization pruning quantization quantization-aware-training smoothquant sparsegpt sparsity
Last synced: 09 Dec 2025
https://github.com/intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
auto-tuning awq fp4 gptq int4 int8 knowledge-distillation large-language-models low-precision mxformat post-training-quantization pruning quantization quantization-aware-training smoothquant sparsegpt sparsity
Last synced: 12 May 2025
https://github.com/666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
batch-normalization-fuse bnn convolutional-networks dorefa group-convolution integer-arithmetic-only model-compression network-in-network network-slimming neuromorphic-computing onnx post-training-quantization pruning pytorch quantization quantization-aware-training tensorrt tensorrt-int8-python twn xnor-net
Last synced: 20 Mar 2025
https://github.com/alibaba/tinyneuralnetwork
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
deep-learning deep-neural-networks model-compression model-converter post-training-quantization pruning pytorch quantization-aware-training
Last synced: 14 Oct 2025
https://github.com/squeezeailab/squeezellm
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
efficient-inference large-language-models llama llm localllm model-compression natural-language-processing post-training-quantization quantization small-models text-generation transformer
Last synced: 13 Apr 2025
https://github.com/ModelTC/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
awq benchmark deployment evaluation internlm2 large-language-models lightllm llama3 llm lvlm mixtral omniquant post-training-quantization pruning quantization quarot smoothquant spinquant tool vllm
Last synced: 23 Apr 2025
https://github.com/xiuyu-li/q-diffusion
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
ddim diffusion-models model-compression post-training-quantization pytorch quantization stable-diffusion
Last synced: 06 Apr 2025
https://github.com/megvii-research/Sparsebit
A model compression and acceleration toolbox based on pytorch.
deep-learning post-training-quantization pruning quantization quantization-aware-training sparse tensorrt
Last synced: 12 May 2025
https://github.com/megvii-research/FQ-ViT
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
imagenet post-training-quantization pytorch quantization vision-transformer
Last synced: 20 Mar 2025
https://github.com/sayakpaul/adventures-in-tensorflow-lite
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model
Last synced: 20 Sep 2025
https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model
Last synced: 09 Jul 2025
https://github.com/hkproj/quantization-notes
Notes on quantization in neural networks
deep-learning neural-networks post-training-quantization pytorch quantization quantization-aware-training
Last synced: 06 May 2025
https://github.com/modeltc/tfmq-dm
[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
cvpr cvpr2024 ddim diffusion-models highlight ldm post-training-quantization quantization stable-diffusion
Last synced: 04 Apr 2025
https://github.com/modeltc/qllm
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"
llama llama2 llm post-training-quantization pytorch quantization transformers
Last synced: 03 Aug 2025
https://github.com/gaurav-van/fine-tuning-llms
Introductory Guide where we will talk about Different Techniques of Fine Tuning LLMs
1-bit-quantization bitnet fine-tuning finetuning-llms gemma llama2 llms lora post-training-quantization qlora quantization quantization-algorithms quantization-aware-training quantization-from-scratch
Last synced: 09 Apr 2025
https://github.com/tanyachutani/quantization_tensorflow
Quantization for Object Detection in Tensorflow 2.x
model-optimization object-detection post-training-quantization quantization quantization-aware-training tensorflow2
Last synced: 15 Oct 2025
https://github.com/omidghadami95/efficientnetv2_quantization_ck
EfficientNetV2 (Efficientnetv2-b2) and quantization int8 and fp32 (QAT and PTQ) on CK+ dataset . fine-tuning, augmentation, solving imbalanced dataset, etc.
ckplus efficientnet efficientnetv2 efficientnetv2-b2 emotion-recognition facial-emotion-recognition googlecolab imbalanced-dataset keras post-training-quantization ptq python qat quantization quantization-aware-training real-time-emotion-classification real-time-emotion-detection scale-down tensorflow
Last synced: 22 Feb 2025