https://github.com/blue-no1/quantization-experiments
Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.
https://github.com/blue-no1/quantization-experiments
inference llm model-compression quantization
Last synced: 5 months ago
JSON representation
Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.
- Host: GitHub
- URL: https://github.com/blue-no1/quantization-experiments
- Owner: Blue-No1
- License: mit
- Created: 2025-08-23T02:35:55.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-08-23T03:15:20.000Z (8 months ago)
- Last Synced: 2025-08-23T04:28:28.322Z (8 months ago)
- Topics: inference, llm, model-compression, quantization
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Quantization Experiments
Notes on memory & performance trade-offs with quantization.
Ghi chú về cân bằng giữa bộ nhớ và hiệu năng khi lượng tử hóa.
> ⚠️ Work in progress / Đang trong quá trình nghiên cứu
## Focus
- FP32 → FP16 → INT8 → 4bit (GGUF).
- Memory footprint reduction.
- Inference speed vs accuracy.
## Progress Log
- [YYYY-MM-DD] Init repo.