https://intel.github.io/neural-compressor/
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
https://intel.github.io/neural-compressor/
auto-tuning awq fp4 gptq int4 int8 knowledge-distillation large-language-models low-precision mxformat post-training-quantization pruning quantization quantization-aware-training smoothquant sparsegpt sparsity
Last synced: 2 months ago
JSON representation
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
- Host: GitHub
- URL: https://intel.github.io/neural-compressor/
- Owner: intel
- License: apache-2.0
- Created: 2020-07-21T23:49:56.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-12-05T12:39:39.000Z (2 months ago)
- Last Synced: 2025-12-07T04:13:05.646Z (2 months ago)
- Topics: auto-tuning, awq, fp4, gptq, int4, int8, knowledge-distillation, large-language-models, low-precision, mxformat, post-training-quantization, pruning, quantization, quantization-aware-training, smoothquant, sparsegpt, sparsity
- Language: Python
- Homepage: https://intel.github.io/neural-compressor/
- Size: 436 MB
- Stars: 2,542
- Watchers: 30
- Forks: 283
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-python - intel.github.io/neural-compressor