{"id":13754088,"url":"https://github.com/locuslab/wanda","last_synced_at":"2025-04-12T21:26:17.678Z","repository":{"id":176373469,"uuid":"650246838","full_name":"locuslab/wanda","owner":"locuslab","description":"A simple and effective LLM pruning approach.","archived":false,"fork":false,"pushed_at":"2024-08-09T03:50:00.000Z","size":112,"stargazers_count":731,"open_issues_count":42,"forks_count":101,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-04T01:06:05.700Z","etag":null,"topics":["large-language-models","network-pruning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2306.11695","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/locuslab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-06T16:54:44.000Z","updated_at":"2025-04-03T04:26:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"7fb14020-9845-4b4d-94c8-c7510c0e8710","html_url":"https://github.com/locuslab/wanda","commit_stats":null,"previous_names":["locuslab/wanda"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fwanda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fwanda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fwanda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fwanda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/locuslab","download_url":"https://codeload.github.com/locuslab/wanda/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248633456,"owners_count":21136872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["large-language-models","network-pruning"],"created_at":"2024-08-03T09:01:39.358Z","updated_at":"2025-04-12T21:26:17.648Z","avatar_url":"https://github.com/locuslab.png","language":"Python","readme":"# Pruning LLMs by Weights and Activations\nOfficial PyTorch implementation of **Wanda** (Pruning by **W**eights **and a**ctivations), as presented in our paper:\n\n**A Simple and Effective Pruning Approach for Large Language Models** \u003c/br\u003e\n*Mingjie Sun\\*, Zhuang Liu\\*, Anna Bair, J. Zico Kolter* (* indicates equal contribution) \u003cbr\u003e\nCarnegie Mellon University, Meta AI Research and Bosch Center for AI  \u003cbr\u003e\n[Paper](https://arxiv.org/abs/2306.11695) - [Project page](https://eric-mingjie.github.io/wanda/home.html)\n\n```bibtex\n@article{sun2023wanda,\n  title={A Simple and Effective Pruning Approach for Large Language Models}, \n  author={Sun, Mingjie and Liu, Zhuang and Bair, Anna and Kolter, J. Zico},\n  year={2023},\n  journal={arXiv preprint arXiv:2306.11695}\n}\n```\n\n--- \n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://user-images.githubusercontent.com/20168304/273351964-53c3807e-3453-49c5-b855-b620b1026466.png\" width=100% height=100% \nclass=\"center\"\u003e\n\u003c/p\u003e\n\nCompared to magnitude pruning which removes weights solely based on their magnitudes, our pruning approach **Wanda** removes weights on a *per-output* basis, by the product of weight magnitudes and input activation norms.\n\n## Update\n- [x] (9.22.2023) Add [support](https://github.com/locuslab/wanda#pruning-llama-2) for LLaMA-2.\n- [x] (9.22.2023) Add [code](https://github.com/locuslab/wanda#ablation-on-obs-weight-update) to reproduce the ablation study on OBS weight update in the paper.\n- [x] (10.6.2023) Add new [support](https://github.com/locuslab/wanda#ablation-on-obs-weight-update) for the weight update analysis in the ablation study. Feel free to try it out!\n- [x] (10.6.2023) Add [support](https://github.com/locuslab/wanda#zero-shot-evaluation) for zero-shot evaluation.\n- [x] (10.20.2023) Add code for pruning OPT models.\n- [x] (10.23.2023) Add code for [LoRA fine-tuning](lora_ft).\n\n## Setup\nInstallation instructions can be found in [INSTALL.md](INSTALL.md).\n\n## Usage\nThe [scripts](scripts) directory contains all the bash commands to replicate the main results (Table 2) in our paper.\n\nBelow is an example command for pruning LLaMA-7B with Wanda, to achieve unstructured 50% sparsity.\n```sh\npython main.py \\\n    --model decapoda-research/llama-7b-hf \\\n    --prune_method wanda \\\n    --sparsity_ratio 0.5 \\\n    --sparsity_type unstructured \\\n    --save out/llama_7b/unstructured/wanda/ \n```\nWe provide a quick overview of the arguments:  \n- `--model`: The identifier for the LLaMA model on the Hugging Face model hub.\n- `--cache_dir`: Directory for loading or storing LLM weights. The default is `llm_weights`.\n- `--prune_method`: We have implemented three pruning methods, namely [`magnitude`, `wanda`, `sparsegpt`].\n- `--sparsity_ratio`: Denotes the percentage of weights to be pruned.\n- `--sparsity_type`: Specifies the type of sparsity [`unstructured`, `2:4`, `4:8`].\n- `--use_variant`: Whether to use the Wanda variant, default is `False`. \n- `--save`: Specifies the directory where the result will be stored.\n\nFor structured N:M sparsity, set the argument `--sparsity_type` to \"2:4\" or \"4:8\". An illustrative command is provided below:\n```sh\npython main.py \\\n    --model decapoda-research/llama-7b-hf \\\n    --prune_method wanda \\\n    --sparsity_ratio 0.5 \\\n    --sparsity_type 2:4 \\\n    --save out/llama_7b/2-4/wanda/ \n```\n\n### Pruning LLaMA-2\nFor [LLaMA-2](https://ai.meta.com/llama/) models, replace `--model` with `meta-llama/Llama-2-7b-hf` (take `7b` as an example):\n```sh \npython main.py \\\n    --model meta-llama/Llama-2-7b-hf \\\n    --prune_method wanda \\\n    --sparsity_ratio 0.5 \\\n    --sparsity_type unstructured \\\n    --save out/llama2_7b/unstructured/wanda/\n```\nLLaMA-2 results: (LLaMA-2-34b is not released as of 9.22.2023)\n|sparsity| ppl              | llama2-7b | llama2-13b | llama2-70b |\n|------|------------------|----------|------------|------------|\n|-| dense            | 5.12     | 4.57       | 3.12     |\n|unstructured 50%| magnitude        | 14.89    | 6.37       | 4.98     |\n|unstructured 50%| sparsegpt        | 6.51     | 5.63       | **3.98**  |\n|unstructured 50%| wanda            | **6.42** | **5.56**   | **3.98**  |\n|4:8| magnitude        | 16.48    | 6.76       | 5.58     |\n|4:8| sparsegpt        | 8.12     | 6.60      | 4.59     |\n|4:8| wanda            | **7.97** | **6.55**  | **4.47**     |\n|2:4| magnitude        | 54.59    | 8.33       | 6.33       |\n|2:4| sparsegpt        | **10.17** | 8.32       | 5.40      |\n|2:4| wanda            | 11.02    | **8.27**   | **5.16**     |\n\n### Ablation on OBS weight update\nTo reproduce the analysis on weight update, we provide our implementation for this ablation. All commands can be found in [this script](scripts/ablate_weight_update.sh).\n```sh\nfor method in ablate_mag_seq ablate_wanda_seq ablate_mag_iter ablate_wanda_iter \ndo \nCUDA_VISIBLE_DEVICES=0 python main.py \\\n  --model decapoda-research/llama-7b-hf \\\n  --sparsity_ratio 0.5 \\\n  --sparsity_type unstructured \\\n  --prune_method ${method} \\\n  --save out/llama_7b_ablation/unstructured/\ndone \n```\nHere `ablate_{mag/wanda}_{seq/iter}` means that we use magnitude pruning or wanda to obtain the pruned mask at each layer, then apply weight update procedure with either a sequential style or an iterative style every 128 input channels. For details, please see Section 5 of our [paper](https://arxiv.org/abs/2306.11695).\n\n### Zero-Shot Evaluation\nFor evaluating zero-shot tasks, we modify the [EleutherAI LM Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master) framework so that it could evaluate pruned LLM models. We provide the modified repo in [this link](https://drive.google.com/file/d/1zugbLyGZKsH1L19L9biHLfaGGFnEc7XL/view?usp=sharing). Make sure to download, extract and install this custom `lm_eval` package from the source code.\n\nFor reproducibility, we used [commit `df3da98`](https://github.com/EleutherAI/lm-evaluation-harness/tree/df3da98c5405deafd519c2ddca52bb7c3fe36bef) on the main branch. All tasks were evaluated on task version of 0 except for BoolQ, where the task version is 1.\n\nOn a high level, the functionality we provide is adding two arguments `pretrained_model` and `tokenizer` in this [function](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/evaluator.py#L17). We can then call this `simple_evaluate` function API from our [codebase](https://github.com/locuslab/wanda/blob/main/lib/eval.py#L148) to evaluate sparse pruned LLMs. To evaluate zero-shot tasks in addition to the WikiText perplexity, pass the `--eval_zero_shot` argument. \n\n### Speedup Evaluation\nThe pruning speed for each method is evaluated by the cumulated time spent on pruning (for each layer), without the forward passes.\n\nFor inference speedup with structured sparsity, we refer the reader to this [blog post](https://pytorch.org/tutorials/prototype/semi_structured_sparse.html), where  structured sparsity is supported by `PyTorch \u003e= 2.1`. You can switch between the CUTLASS or CuSPARSELt kernel [here](https://github.com/pytorch/pytorch/blob/v2.1.0/torch/sparse/semi_structured.py#L55).\n\nLast, for pruning image classifiers, see directory [image_classifiers](image_classifiers) for details.\n\n## Acknowledgement\nThis repository is build upon the [SparseGPT](https://github.com/IST-DASLab/sparsegpt) repository.\n\n## License\nThis project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.\n\n## Questions\nFeel free to discuss papers/code with us through issues/emails!\n\nmingjies at cs.cmu.edu  \nliuzhuangthu at gmail.com ","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fwanda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flocuslab%2Fwanda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fwanda/lists"}