{"id":21773516,"url":"https://github.com/Hsu1023/DuQuant","last_synced_at":"2025-07-19T10:31:00.019Z","repository":{"id":257300242,"uuid":"805886873","full_name":"Hsu1023/DuQuant","owner":"Hsu1023","description":"Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs","archived":false,"fork":false,"pushed_at":"2024-09-09T17:48:55.000Z","size":2191,"stargazers_count":23,"open_issues_count":3,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-09-15T20:49:56.267Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Hsu1023.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-25T18:45:13.000Z","updated_at":"2024-09-12T07:11:58.000Z","dependencies_parsed_at":"2024-09-15T20:50:10.240Z","dependency_job_id":"ea37445e-7c6d-4fb3-8ec8-3508e9227f50","html_url":"https://github.com/Hsu1023/DuQuant","commit_stats":null,"previous_names":["hsu1023/duquant"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hsu1023%2FDuQuant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hsu1023%2FDuQuant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hsu1023%2FDuQuant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hsu1023%2FDuQuant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Hsu1023","download_url":"https://codeload.github.com/Hsu1023/DuQuant/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226584451,"owners_count":17655036,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-26T17:01:31.843Z","updated_at":"2024-11-26T17:01:51.449Z","avatar_url":"https://github.com/Hsu1023.png","language":"Python","readme":"# DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs\n\n\u003ch5 align=\"center\"\u003e\n\n[![arXiv](https://img.shields.io/badge/DuQuant-2406.01721-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2406.01721)\n[![Website](https://img.shields.io/badge/🎤%20Project-Website-blue)](https://duquant.github.io)\n[![License](https://img.shields.io/badge/⚖️%20Code%20License-MIT-yellow)](https://github.com/Hsu1023/DuQuant/blob/main/LICENSE)\n \u003cbr\u003e\n\n\u003c/h5\u003e\n\nWelcome to the official code repository for \"[DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs **(NeurIPS 2024, Oral)**](https://arxiv.org/abs/2406.01721)\".\n\n🔍 For more details, please refer to the project page: [https://duquant.github.io/](https://duquant.github.io/).\n\n\n## 📰 News\n* [2024/09/26] 🌟 Our DuQuant paper has been accepted for a Oral presentation at NeurIPS 2024 (only top 1% out of 15,671 submissions)! 🎉 Cheers!\n* [2024/09/06] 🔥 We release the code!\n* [2024/06/03] 🚀 Our paper is available on arXiv!\n\n\n## 👀 Introduction\n![duquant](imgs/duquant.png)\n\n- We firstly identify **Massive Outliers** existence at the **down_proj** layer of FFN module in recent LLMs.\n- DuQuant proposes to use **Rotation transformation** and **Permutation transformation** to effectively eliminate both massive and normal outliers.\n- DuQuant establishs new **state-of-the-art** baselines for 4-bit weight-activation quantization across various model types and downstream tasks.\n\n\n\n## 🔧 Installation\n```bash\nconda create -n duquant python=3.10 -y\nconda activate duquant\ngit clone https://github.com/Hsu1023/DuQuant.git\npip install --upgrade pip \npip install -r requirements.txt\n```\n\n## ⚙️ Usage\n### 1. Preprocessing\n```bash\npython get_rot.py # need to be run only once for all models\npython generate_act_scale_shift.py --model PATH_OF_MODEL # need to be run only once for each model (path can be hugging-face hub path or relative path)\n```\n\n### 2. Quantization\nThe bash script for `DuQuant` can be found in `run.sh`. You can choose the model to be quantized by providing model path after `--model` order. To evaluate `DuQuant + lwc` method, you can run `run_lwc.sh` script. In addition, you can add `--save_dir` to save the quantized models, and use `--resume` to reload the saved models. \n\n\n#### Explanation of arguments:\n- `--model`: the local model path or huggingface format.\n- `--wbits`: weight quantization bits.\n- `--abits`: activation quantization bits.\n- `--block_size`: the block size of rotation matrices.\n- `--max_rotation_step`: the max greedy search steps of rotation transformation.\n- `--permutation_times`: the time of permutation transformation.\n- `--swc`: the ratio of weight clipping (enable without LWC operation).\n- `--lac`: the ratio of activation clipping.\n- `--lwc`: activate the Learnable Weight Clipping (LWC).\n- `--epochs`: the training epochs of LWC.\n- `--resume`: loading pre-trained DuQuant parameters.\n- `--multigpu`: to inference larger network on multiple GPUs.\n- `--save_dir`: saving the quantization model for further exploration.\n- `--eval_ppl`: evaluating the perplexity of quantized models.\n- `--tasks`: evaluating on the zero-shot tasks.\n- `--eval_mmlu`: evaluating on the MMLU benchmarks.\n- `--mmlu_data_dir`: data path of the MMLU benchmarks.\n- `--eval_mtbench`: evaluating on the MT-Bench.\n\n\n### 3. Model Zoo\n\nCurrently, we support LLaMA series (LLaMA 1, 2 and 3), Vicuna series, and Mistral models. \n\n| Models      | 7B/8B | 13B  | 30B  | 65B/70B |\n| ----------- | ----- | ---- | ---- | ------- |\n| LLaMA1      | ✅     | ✅    | ✅    | ✅       |\n| LLaMA2      | ✅     | ✅    | ---  | ✅       |\n| LLaMA3      | ✅     | ---  | ---  | ✅       |\n| Vicuna-v1.5 | ✅     | ✅    | ---  | ---     |\n| Mistral     | ✅     | ---  | ---  | ---     |\n\n## 📜 Result\n\n- DuQuant achieves SoTA performance in PPL evaluation under W4A4 quantization.\n![ppl](imgs/ppl.png)\n\n- DuQuant showcases robustness towards LLaMA3-8B quantization.\n![llama3](imgs/llama3.png)\n\n\n## 📂 Contact\nFor immediate queries or further information, please open an issue or contact \u003cxuhb20@mails.tsinghua.edu.cn\u003e or \u003chaokun.lin@cripac.ia.ac.cn\u003e.\n\n## 🙏 Acknowledgement\nThis repo is built upon the following projects:\n\n* [OmniQuant](https://github.com/OpenGVLab/OmniQuant)\n* [IntactKV](https://github.com/ruikangliu/IntactKV)\n* [EAGLE](https://github.com/SafeAILab/EAGLE)\n* [FastChat](https://github.com/lm-sys/FastChat)\n\nWe thank the authors for their code.\n\n## 📝 Citation\nWe kindly request that you cite our work if you utilize the code or reference our findings in your research:\n\u003c!-- Please cite our work if you use our code or discuss our findings in your own research: --\u003e\n```\n@article{lin2024duquant,\n  title={DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs},\n  author={Lin, Haokun and Xu, Haobo and Wu, Yichen and Cui, Jingzhi and Zhang, Yingtao and Mou, Linzhan and Song, Linqi and Sun, Zhenan and Wei, Ying},\n  journal={arXiv preprint arXiv:2406.01721},\n  year={2024}\n}\n","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHsu1023%2FDuQuant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHsu1023%2FDuQuant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHsu1023%2FDuQuant/lists"}