{"id":18264539,"url":"https://github.com/modeltc/qllm","last_synced_at":"2026-03-07T00:33:19.063Z","repository":{"id":223612712,"uuid":"761009800","full_name":"ModelTC/QLLM","owner":"ModelTC","description":"[ICLR 2024] This is the official PyTorch implementation of \"QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models\"","archived":false,"fork":false,"pushed_at":"2024-03-11T02:56:00.000Z","size":1764,"stargazers_count":34,"open_issues_count":1,"forks_count":3,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-10-18T21:59:01.270Z","etag":null,"topics":["llama","llama2","llm","post-training-quantization","pytorch","quantization","transformers"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2310.08041","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ModelTC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-21T04:13:17.000Z","updated_at":"2024-10-16T10:47:25.000Z","dependencies_parsed_at":"2024-08-18T18:15:31.057Z","dependency_job_id":null,"html_url":"https://github.com/ModelTC/QLLM","commit_stats":{"total_commits":9,"total_committers":2,"mean_commits":4.5,"dds":"0.11111111111111116","last_synced_commit":"653a329e4a5bf17b9296854617e093b7d45643b9"},"previous_names":["modeltc/qllm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FQLLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FQLLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FQLLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelTC%2FQLLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ModelTC","download_url":"https://codeload.github.com/ModelTC/QLLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247251971,"owners_count":20908600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llama","llama2","llm","post-training-quantization","pytorch","quantization","transformers"],"created_at":"2024-11-05T11:15:03.629Z","updated_at":"2026-03-07T00:33:14.014Z","avatar_url":"https://github.com/ModelTC.png","language":"Python","readme":"# QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models (ICLR 2024)\n\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) \n[![arXiv](https://img.shields.io/badge/QLLM-2310.08041-b31b1b.svg)](https://arxiv.org/abs/2310.08041)\n\nThis is the official PyTorch implementation of [QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models](https://arxiv.org/abs/2310.08041).\n\nBy [Jing Liu](https://jing-liu.com/), [Ruihao Gong](https://xhplus.github.io/), [Xiuying Wei](https://wimh966.github.io/), [Zhiwei Dong](https://zwdong.com.cn/), [Jianfei Cai](https://jianfei-cai.github.io/), and [Bohan Zhuang](https://bohanzhuang.github.io/).\n\n![qllm](imgs/qllm.png)\n\nWe propose QLLM, an accurate and efficient low-bitwidth post-training quantization method designed for LLMs.\n\n## 📰 News\n- [10-03-2024]  Release the code!🌟\n- [17-01-2024] QLLM is accepted by ICLR 2024! 👏\n\n## 📖 Contents\n- [Install](#🛠-install)\n- [Usage](#⚙️-usage)\n- [Results](#📋-results)\n- [Citation](#📝-citation)\n- [License](#🧾-license)\n- [Acknowledgement](#🙏-acknowledgement)\n\n## 🛠 Install\n```\nconda create -n qllm python=3.10 -y\nconda activate qllm\ngit clone https://github.com/ModelTC/QLLM\ncd QLLM\npip install --upgrade pip \npip install -e .\n```\n\n## ⚙️ Usage\nWe provide the training scripts in `scripts` folder. For example, to perform W4A8 quantization for LLaMA-7B, run\n```\nsh scripts/llama-7b/w4a4.sh\n```\nRemember to change the path of model `model` and output path `output_dir`.\n\n## 📋 Results\n* QLLM achieve SoTA performance in weight-activation quantization\n\n![weight_activation_llama_1](imgs/llama_1_results.png)\n![weight_activation_llama_2](imgs/llama_2_results.png)\n\n## 📝 Citation\nIf you find our `QLLM` useful in your research, please consider to cite the following related papers:\n```\n@inproceedings{liu2024qllm,\n  title = {{QLLM}: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models},\n  author = {Liu, Jing and Gong, Ruihao and Wei, Xiuying and Dong, Zhiwei and Cai, Jianfei and Zhuang, Bohan},\n  booktitle = {International Conference on Learning Representations (ICLR)},\n  year = {2024},\n}\n```\n\n## 🧾 License\nThis repository is released under the Apache 2.0 license as found in the [LICENSE](./LICENSE) file.\n\n## 🙏 Acknowledgement\nThis repository is built upon [OmniQuant](https://github.com/OpenGVLab/OmniQuant). We thank the authors for their open-sourced code.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodeltc%2Fqllm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodeltc%2Fqllm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodeltc%2Fqllm/lists"}