{"id":22354747,"url":"https://github.com/pprp/Pruner-Zero","last_synced_at":"2025-07-30T09:31:28.471Z","repository":{"id":242988329,"uuid":"806971110","full_name":"pprp/Pruner-Zero","owner":"pprp","description":"Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs","archived":false,"fork":false,"pushed_at":"2024-11-25T09:45:43.000Z","size":1120,"stargazers_count":81,"open_issues_count":1,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-20T09:05:56.158Z","etag":null,"topics":["llm-compression","llm-pruning","symbolic-regression"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2406.02924","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pprp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-28T08:47:09.000Z","updated_at":"2025-04-11T04:02:17.000Z","dependencies_parsed_at":"2025-01-16T01:10:30.435Z","dependency_job_id":"c7a3bb5a-b9aa-42eb-a60f-63f18922a3c7","html_url":"https://github.com/pprp/Pruner-Zero","commit_stats":null,"previous_names":["pprp/pruner-zero"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pprp/Pruner-Zero","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pprp%2FPruner-Zero","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pprp%2FPruner-Zero/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pprp%2FPruner-Zero/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pprp%2FPruner-Zero/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pprp","download_url":"https://codeload.github.com/pprp/Pruner-Zero/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pprp%2FPruner-Zero/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267845681,"owners_count":24153774,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm-compression","llm-pruning","symbolic-regression"],"created_at":"2024-12-04T13:15:02.686Z","updated_at":"2025-07-30T09:31:28.184Z","avatar_url":"https://github.com/pprp.png","language":"Python","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/pprp/Pruner-Zero/main/.github/images/logo-of-pruner-zero.png\" width=\"20%\"\u003e \u003cbr\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003ch1\u003ePruner-Zero\u003c/h1\u003e\n  \u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://icml.cc/Conferences/2024\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Conference-ICML-FFB000.svg?style=flat-square\" alt=\"LLaMA\"\u003e\n  \u003c/a\u003e\n  \u003ca\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-MIT-FFB000.svg?style=flat-square\" alt=\"LLaMA\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/facebookresearch/llama\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/LLMs-LLaMA-FFB000.svg?style=flat-square\" alt=\"LLaMA\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/facebookresearch/llama\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/LLMs-Llama2-FAB093.svg?style=flat-square\" alt=\"Llama-2\"\u003e\n  \u003c/a\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\n\nOfficial PyTorch implementation of Pruner-Zero, accepted by ICML2024\n\n[**Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models**](https://arxiv.org/abs/2406.02924v1) \u003c/br\u003e\n*Peijie Dong\\*, Lujun Li\\* (* indicates equal contribution), Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu \u003cbr\u003e\nHKUST(GZ), HKUST, HKBU, HIT(SZ) \u003cbr\u003e\n\n\n## Contents\n- [Introduction](#introduction)\n- [Setup](#setup)\n- [Usage](#usage)\n- [Zero-Shot Harness Evaluation](#zero-shot-harness-evaluation)\n- [Acknowledgement](#Acknowledgement)\n- [License](#license)\n- [Citation](#citation)\n\n\n\n--- \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/pprp/Pruner-Zero/main/.github/images/pruner-zero-main-figure.png\" width=100% height=100% \nclass=\"center\"\u003e\n\u003c/p\u003e\n\n\n## Introduction \n\nDespite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods.\n\n\n## Setup\n\nInstallation instructions can be found in [INSTALL.md](INSTALL.md).\n\n## Usage \n\nOur method require computation of gradient magnitude for calculation of pruning metric, following [GBLM-Pruner](https://github.com/VILA-Lab/GBLM-Pruner/blob/main/gradient_computation.py). For more scripts, see [grad_computation.sh](scripts/grad_computation.sh)\n\n```bash \n# Demo for OPT \nCUDA_VISIBLE_DEVICES=0 python lib/gradient_computation.py --nsamples 128 \\\n    --model /path/to/facebook/opt-125m --llama_version 2 --task gradient\n\n# Demo for LLama-1\nCUDA_VISIBLE_DEVICES=0,1 python lib/gradient_computation.py --nsamples 1 \\\n    --model $PATH_TO_LLAMA1 --llama_version 1 --task gradient \n\n# Demo for LLama-2 \nCUDA_VISIBLE_DEVICES=0,1 python lib/gradient_computation.py --nsamples 128 \\\n    --model $PATH_TO_LLAMA2 --llama_version 2 --task gradient\n```\n\nBelow is an example command for pruning LLaMA-7B with Pruner-Zero, to achieve unstructured 50% sparsity.\n\n```sh\npython main.py \\\n    --model decapoda-research/llama-7b-hf \\\n    --prune_method pruner-zero \\\n    --sparsity_ratio 0.5 \\\n    --sparsity_type unstructured \\\n    --save out/llama_7b/unstructured/pruner-zero/ \n```\nWe provide a quick overview of the arguments:  \n- `--model`: The identifier for the LLaMA model on the Hugging Face model hub.\n- `--cache_dir`: Directory for loading or storing LLM weights. The default is `llm_weights`.\n- `--prune_method`: We have implemented three pruning methods, namely [`magnitude`, `wanda`, `sparsegpt`, `pruner-zero`].\n- `--sparsity_ratio`: Denotes the percentage of weights to be pruned.\n- `--sparsity_type`: Specifies the type of sparsity [`unstructured`, `2:4`, `4:8`].\n- `--save`: Specifies the directory where the result will be stored.\n\nFor structured N:M sparsity, set the argument `--sparsity_type` to \"2:4\" or \"4:8\". An illustrative command is provided below:\n\n```sh\npython main.py \\\n    --model decapoda-research/llama-7b-hf \\\n    --prune_method pruner-zero \\\n    --sparsity_ratio 0.5 \\\n    --sparsity_type 2:4 \\\n    --save out/llama_7b/2-4/pruner-zero/ \n```\n\n### Pruning LLaMA-2\n\nFor [LLaMA-2](https://ai.meta.com/llama/) models, replace `--model` with `meta-llama/Llama-2-7b-hf` (take `7b` as an example):\n```sh \npython main.py \\\n    --model meta-llama/Llama-2-7b-hf \\\n    --prune_method pruner-zero \\\n    --sparsity_ratio 0.5 \\\n    --sparsity_type unstructured \\\n    --save out/llama2_7b/unstructured/pruner-zero/\n```\n\n### Searched Symbolic Pruning Metric \n\n```json\n{\n    \"data\": \"mul\",\n    \"left\": {\n        \"data\": \"abs\",\n        \"left\": {\n            \"data\": \"mul\",\n            \"left\": {\n                \"data\": \"W\"\n            },\n            \"right\": {\n                \"data\": \"W\"\n            }\n        }\n    },\n    \"right\": {\n        \"data\": \"mms\",\n        \"left\": {\n            \"data\": \"G\"\n        }\n    }\n}\n```\n\n### Zero-Shot Evaluation\n\nFor evaluating zero-shot tasks, we modify the [EleutherAI LM Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master) framework so that it could evaluate pruned LLM models. We provide the modified repo in [this link](https://drive.google.com/file/d/1zugbLyGZKsH1L19L9biHLfaGGFnEc7XL/view?usp=sharing). Make sure to download, extract and install this custom `lm_eval` package from the source code.\n\nFor reproducibility, we used [commit `df3da98`](https://github.com/EleutherAI/lm-evaluation-harness/tree/df3da98c5405deafd519c2ddca52bb7c3fe36bef) on the main branch. All tasks were evaluated on task version of 0 except for BoolQ, where the task version is 1.\n\nOn a high level, the functionality we provide is adding two arguments `pretrained_model` and `tokenizer` in this [function](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/evaluator.py#L17). We can then call this `simple_evaluate` function API from our [codebase](https://github.com/locuslab/wanda/blob/main/lib/eval.py#L148) to evaluate sparse pruned LLMs. To evaluate zero-shot tasks in addition to the WikiText perplexity, pass the `--eval_zero_shot` argument. \n\n\n## Acknowledgement\nThis repository is build upon the [SparseGPT](https://github.com/IST-DASLab/sparsegpt), [Wanda](https://github.com/locuslab/wanda) and [GBLM-Pruner](https://github.com/VILA-Lab/GBLM-Pruner) repository.\n\n## License\nThis project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.\n\n## Citation \n\n```bibtex\n@inproceedings{dong2024pruner,\n  title={Pruner-Zero: Evolving Symbolic Pruning Metric from Scratch for Large Language Models},\n  author={Dong, Peijie and Li, Lujun and Tang, Zhenheng and Liu, Xiang and Pan, Xinglin and Wang, Qiang and Chu, Xiaowen},\n  booktitle={Proceedings of the 41st International Conference on Machine Learning},\n  year={2024},\n  organization={PMLR},\n  url={https://arxiv.org/abs/2406.02924},\n  note={[arXiv: 2406.02924]}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpprp%2FPruner-Zero","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpprp%2FPruner-Zero","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpprp%2FPruner-Zero/lists"}