Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Hsu1023/DuQuant
Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs
https://github.com/Hsu1023/DuQuant
Last synced: about 2 months ago
JSON representation
Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs
- Host: GitHub
- URL: https://github.com/Hsu1023/DuQuant
- Owner: Hsu1023
- License: mit
- Created: 2024-05-25T18:45:13.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-09-09T17:48:55.000Z (4 months ago)
- Last Synced: 2024-09-15T20:49:56.267Z (4 months ago)
- Language: Python
- Homepage:
- Size: 2.09 MB
- Stars: 23
- Watchers: 1
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - Hsu1023/DuQuant - 激活量化结果。DuQuant 已在 NeurIPS 2024 大会上被评为口头报告,并已开源,用户可以通过提供的代码库进行安装和使用,并根据需要调整参数以进行量化实验。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
README
# DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs
[![arXiv](https://img.shields.io/badge/DuQuant-2406.01721-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2406.01721)
[![Website](https://img.shields.io/badge/🎤%20Project-Website-blue)](https://duquant.github.io)
[![License](https://img.shields.io/badge/⚖️%20Code%20License-MIT-yellow)](https://github.com/Hsu1023/DuQuant/blob/main/LICENSE)
Welcome to the official code repository for "[DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs **(NeurIPS 2024, Oral)**](https://arxiv.org/abs/2406.01721)".
🔍 For more details, please refer to the project page: [https://duquant.github.io/](https://duquant.github.io/).
## 📰 News
* [2024/09/26] 🌟 Our DuQuant paper has been accepted for a Oral presentation at NeurIPS 2024 (only top 1% out of 15,671 submissions)! 🎉 Cheers!
* [2024/09/06] 🔥 We release the code!
* [2024/06/03] 🚀 Our paper is available on arXiv!## 👀 Introduction
![duquant](imgs/duquant.png)- We firstly identify **Massive Outliers** existence at the **down_proj** layer of FFN module in recent LLMs.
- DuQuant proposes to use **Rotation transformation** and **Permutation transformation** to effectively eliminate both massive and normal outliers.
- DuQuant establishs new **state-of-the-art** baselines for 4-bit weight-activation quantization across various model types and downstream tasks.## 🔧 Installation
```bash
conda create -n duquant python=3.10 -y
conda activate duquant
git clone https://github.com/Hsu1023/DuQuant.git
pip install --upgrade pip
pip install -r requirements.txt
```## ⚙️ Usage
### 1. Preprocessing
```bash
python get_rot.py # need to be run only once for all models
python generate_act_scale_shift.py --model PATH_OF_MODEL # need to be run only once for each model (path can be hugging-face hub path or relative path)
```### 2. Quantization
The bash script for `DuQuant` can be found in `run.sh`. You can choose the model to be quantized by providing model path after `--model` order. To evaluate `DuQuant + lwc` method, you can run `run_lwc.sh` script. In addition, you can add `--save_dir` to save the quantized models, and use `--resume` to reload the saved models.#### Explanation of arguments:
- `--model`: the local model path or huggingface format.
- `--wbits`: weight quantization bits.
- `--abits`: activation quantization bits.
- `--block_size`: the block size of rotation matrices.
- `--max_rotation_step`: the max greedy search steps of rotation transformation.
- `--permutation_times`: the time of permutation transformation.
- `--swc`: the ratio of weight clipping (enable without LWC operation).
- `--lac`: the ratio of activation clipping.
- `--lwc`: activate the Learnable Weight Clipping (LWC).
- `--epochs`: the training epochs of LWC.
- `--resume`: loading pre-trained DuQuant parameters.
- `--multigpu`: to inference larger network on multiple GPUs.
- `--save_dir`: saving the quantization model for further exploration.
- `--eval_ppl`: evaluating the perplexity of quantized models.
- `--tasks`: evaluating on the zero-shot tasks.
- `--eval_mmlu`: evaluating on the MMLU benchmarks.
- `--mmlu_data_dir`: data path of the MMLU benchmarks.
- `--eval_mtbench`: evaluating on the MT-Bench.### 3. Model Zoo
Currently, we support LLaMA series (LLaMA 1, 2 and 3), Vicuna series, and Mistral models.
| Models | 7B/8B | 13B | 30B | 65B/70B |
| ----------- | ----- | ---- | ---- | ------- |
| LLaMA1 | ✅ | ✅ | ✅ | ✅ |
| LLaMA2 | ✅ | ✅ | --- | ✅ |
| LLaMA3 | ✅ | --- | --- | ✅ |
| Vicuna-v1.5 | ✅ | ✅ | --- | --- |
| Mistral | ✅ | --- | --- | --- |## 📜 Result
- DuQuant achieves SoTA performance in PPL evaluation under W4A4 quantization.
![ppl](imgs/ppl.png)- DuQuant showcases robustness towards LLaMA3-8B quantization.
![llama3](imgs/llama3.png)## 📂 Contact
For immediate queries or further information, please open an issue or contact or .## 🙏 Acknowledgement
This repo is built upon the following projects:* [OmniQuant](https://github.com/OpenGVLab/OmniQuant)
* [IntactKV](https://github.com/ruikangliu/IntactKV)
* [EAGLE](https://github.com/SafeAILab/EAGLE)
* [FastChat](https://github.com/lm-sys/FastChat)We thank the authors for their code.
## 📝 Citation
We kindly request that you cite our work if you utilize the code or reference our findings in your research:```
@article{lin2024duquant,
title={DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs},
author={Lin, Haokun and Xu, Haobo and Wu, Yichen and Cui, Jingzhi and Zhang, Yingtao and Mou, Linzhan and Song, Linqi and Sun, Zhenan and Wei, Ying},
journal={arXiv preprint arXiv:2406.01721},
year={2024}
}