Ecosyste.ms: Awesome

https://github.com/megvii-research/FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

imagenet post-training-quantization pytorch quantization vision-transformer

Last synced: 01 Jul 2024

https://intel.github.io/neural-compressor/

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

auto-tuning awq fp4 gptq int4 int8 knowledge-distillation large-language-models low-precision mxformat post-training-quantization pruning quantization quantization-aware-training smoothquant sparsegpt sparsity

Last synced: 10 Jun 2024

https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model

Last synced: 11 Apr 2024

https://github.com/megvii-research/Sparsebit

A model compression and acceleration toolbox based on pytorch.

deep-learning post-training-quantization pruning quantization quantization-aware-training sparse tensorrt

Last synced: 01 Apr 2024

https://github.com/alibaba/TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

deep-learning deep-neural-networks model-compression model-converter post-training-quantization pruning pytorch quantization-aware-training

Last synced: 29 Mar 2024

https://github.com/666DZY666/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

batch-normalization-fuse bnn convolutional-networks dorefa group-convolution integer-arithmetic-only model-compression network-in-network network-slimming neuromorphic-computing onnx post-training-quantization pruning pytorch quantization quantization-aware-training tensorrt tensorrt-int8-python twn xnor-net

Last synced: 26 Mar 2024

https://github.com/intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

auto-tuning awq fp4 gptq int4 int8 knowledge-distillation large-language-models low-precision mxformat post-training-quantization pruning quantization quantization-aware-training smoothquant sparsegpt sparsity

Last synced: 23 Mar 2024

Accepted Projects