https://github.com/blue-no1/quantization-experiments

Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.
https://github.com/blue-no1/quantization-experiments

inference llm model-compression quantization

Last synced: 5 months ago
JSON representation

Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.

Host: GitHub
URL: https://github.com/blue-no1/quantization-experiments
Owner: Blue-No1
License: mit
Created: 2025-08-23T02:35:55.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-08-23T03:15:20.000Z (8 months ago)
Last Synced: 2025-08-23T04:28:28.322Z (8 months ago)
Topics: inference, llm, model-compression, quantization
Homepage:
Size: 9.77 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Quantization Experiments

Notes on memory & performance trade-offs with quantization.
Ghi chú về cân bằng giữa bộ nhớ và hiệu năng khi lượng tử hóa.

> ⚠️ Work in progress / Đang trong quá trình nghiên cứu

## Focus
- FP32 → FP16 → INT8 → 4bit (GGUF).
- Memory footprint reduction.
- Inference speed vs accuracy.

## Progress Log
- [YYYY-MM-DD] Init repo.