Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spcl/QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.
https://github.com/spcl/QuaRot
Last synced: 2 months ago
JSON representation
Code for QuaRot, an end-to-end 4-bit inference of large language models.
- Host: GitHub
- URL: https://github.com/spcl/QuaRot
- Owner: spcl
- License: apache-2.0
- Created: 2024-03-31T19:21:45.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-07-22T11:43:58.000Z (6 months ago)
- Last Synced: 2024-10-31T10:37:38.147Z (3 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2404.00456
- Size: 409 KB
- Stars: 270
- Watchers: 10
- Forks: 20
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - spcl/QuaRot - 70B模型的损失最多为0.29 WikiText困惑度,并保留了99%的零样本性能。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
README
# QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
This repository contains the code for [**QuaRot**: Outlier-Free 4-Bit Inference in Rotated LLMs](https://arxiv.org/abs/2404.00456).## Abstract
We introduce QuaRot, a new **Qua**ntization scheme based on **Rot**ations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This *computational invariance* is applied to the hidden state (residual) of the LLM, as well as to the activations of the feed-forward components, aspects of the attention mechanism and to the KV cache. The result is a quantized model where all matrix multiplications are performed in 4-bits, without any channels identified for retention in higher precision. Our quantized **LLaMa2-70B** model has losses of at most **0.29 WikiText perplexity** and retains **99% of the zero-shot** performance.![Your Image](img/fig1.png)
## Usage
Compile the QuaRot kernels using the following commands:
```bash
git clone https://github.com/spcl/QuaRot.git
cd QuaRot
pip install -e . # or pip install .
```For simulation results, check [fake_quant](https://github.com/spcl/QuaRot/tree/main/fake_quant) directory.
### Star History
[![Star History Chart](https://api.star-history.com/svg?repos=spcl/QuaRot&type=Date)](https://star-history.com/#spcl/QuaRot&Date)
## Citation
The full citation is
```
@article{ashkboos2024quarot,
title={QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs},
author={Ashkboos, Saleh and Mohtashami, Amirkeivan and Croci, Maximilian L and Li, Bo and Jaggi, Martin and Alistarh, Dan and Hoefler, Torsten and Hensman, James},
journal={arXiv preprint arXiv:2404.00456},
year={2024}
}
```