https://intel.github.io/neural-compressor/

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
https://intel.github.io/neural-compressor/

auto-tuning awq fp4 gptq int4 int8 knowledge-distillation large-language-models low-precision mxformat post-training-quantization pruning quantization quantization-aware-training smoothquant sparsegpt sparsity

Last synced: 3 months ago
JSON representation

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Host: GitHub
URL: https://intel.github.io/neural-compressor/
Owner: intel
License: apache-2.0
Created: 2020-07-21T23:49:56.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2025-12-05T12:39:39.000Z (3 months ago)
Last Synced: 2025-12-07T04:13:05.646Z (3 months ago)
Topics: auto-tuning, awq, fp4, gptq, int4, int8, knowledge-distillation, large-language-models, low-precision, mxformat, post-training-quantization, pruning, quantization, quantization-aware-training, smoothquant, sparsegpt, sparsity
Language: Python
Homepage: https://intel.github.io/neural-compressor/
Size: 436 MB
Stars: 2,542
Watchers: 30
Forks: 283
Open Issues: 9
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

awesome-python - intel.github.io/neural-compressor

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://intel.github.io/neural-compressor/

Awesome Lists containing this project