Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/modeltc/qllm
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"
https://github.com/modeltc/qllm
llama llama2 llm post-training-quantization pytorch quantization transformers
Last synced: 12 days ago
JSON representation
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"
- Host: GitHub
- URL: https://github.com/modeltc/qllm
- Owner: ModelTC
- License: apache-2.0
- Created: 2024-02-21T04:13:17.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-03-11T02:56:00.000Z (8 months ago)
- Last Synced: 2024-10-18T21:59:01.270Z (30 days ago)
- Topics: llama, llama2, llm, post-training-quantization, pytorch, quantization, transformers
- Language: Python
- Homepage: https://arxiv.org/abs/2310.08041
- Size: 1.68 MB
- Stars: 34
- Watchers: 8
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models (ICLR 2024)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![arXiv](https://img.shields.io/badge/QLLM-2310.08041-b31b1b.svg)](https://arxiv.org/abs/2310.08041)This is the official PyTorch implementation of [QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models](https://arxiv.org/abs/2310.08041).
By [Jing Liu](https://jing-liu.com/), [Ruihao Gong](https://xhplus.github.io/), [Xiuying Wei](https://wimh966.github.io/), [Zhiwei Dong](https://zwdong.com.cn/), [Jianfei Cai](https://jianfei-cai.github.io/), and [Bohan Zhuang](https://bohanzhuang.github.io/).
![qllm](imgs/qllm.png)
We propose QLLM, an accurate and efficient low-bitwidth post-training quantization method designed for LLMs.
## ๐ฐ News
- [10-03-2024] Release the code!๐
- [17-01-2024] QLLM is accepted by ICLR 2024! ๐## ๐ Contents
- [Install](#๐ -install)
- [Usage](#โ๏ธ-usage)
- [Results](#๐-results)
- [Citation](#๐-citation)
- [License](#๐งพ-license)
- [Acknowledgement](#๐-acknowledgement)## ๐ Install
```
conda create -n qllm python=3.10 -y
conda activate qllm
git clone https://github.com/ModelTC/QLLM
cd QLLM
pip install --upgrade pip
pip install -e .
```## โ๏ธ Usage
We provide the training scripts in `scripts` folder. For example, to perform W4A8 quantization for LLaMA-7B, run
```
sh scripts/llama-7b/w4a4.sh
```
Remember to change the path of model `model` and output path `output_dir`.## ๐ Results
* QLLM achieve SoTA performance in weight-activation quantization![weight_activation_llama_1](imgs/llama_1_results.png)
![weight_activation_llama_2](imgs/llama_2_results.png)## ๐ Citation
If you find our `QLLM` useful in your research, please consider to cite the following related papers:
```
@inproceedings{liu2024qllm,
title = {{QLLM}: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models},
author = {Liu, Jing and Gong, Ruihao and Wei, Xiuying and Dong, Zhiwei and Cai, Jianfei and Zhuang, Bohan},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2024},
}
```## ๐งพ License
This repository is released under the Apache 2.0 license as found in the [LICENSE](./LICENSE) file.## ๐ Acknowledgement
This repository is built upon [OmniQuant](https://github.com/OpenGVLab/OmniQuant). We thank the authors for their open-sourced code.