An open API service indexing awesome lists of open source software.

https://github.com/blue-no1/quantization-experiments

Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.
https://github.com/blue-no1/quantization-experiments

inference llm model-compression quantization

Last synced: 5 months ago
JSON representation

Experiments on quantization for open-weight LLMs — balancing memory footprint, speed, and accuracy.

Awesome Lists containing this project

README

          

# Quantization Experiments

Notes on memory & performance trade-offs with quantization.
Ghi chú về cân bằng giữa bộ nhớ và hiệu năng khi lượng tử hóa.

> ⚠️ Work in progress / Đang trong quá trình nghiên cứu

## Focus
- FP32 → FP16 → INT8 → 4bit (GGUF).
- Memory footprint reduction.
- Inference speed vs accuracy.

## Progress Log
- [YYYY-MM-DD] Init repo.