{"id":37731257,"url":"https://github.com/amazon-science/llm-rank-pruning","last_synced_at":"2026-01-16T13:50:44.546Z","repository":{"id":265514477,"uuid":"895966140","full_name":"amazon-science/llm-rank-pruning","owner":"amazon-science","description":"LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.","archived":false,"fork":false,"pushed_at":"2024-11-29T17:28:18.000Z","size":31,"stargazers_count":7,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-09T05:14:14.958Z","etag":null,"topics":["graph-theory","inference-optimization","large-language-models","llm","llms","pagerank","pruning","weighted-pagerank"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-29T09:25:32.000Z","updated_at":"2025-08-23T14:37:26.000Z","dependencies_parsed_at":"2024-11-29T18:38:44.854Z","dependency_job_id":null,"html_url":"https://github.com/amazon-science/llm-rank-pruning","commit_stats":null,"previous_names":["amazon-science/llm-rank-pruning"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/amazon-science/llm-rank-pruning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-rank-pruning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-rank-pruning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-rank-pruning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-rank-pruning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/llm-rank-pruning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fllm-rank-pruning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28479034,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-theory","inference-optimization","large-language-models","llm","llms","pagerank","pruning","weighted-pagerank"],"created_at":"2026-01-16T13:50:42.842Z","updated_at":"2026-01-16T13:50:44.534Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# LLM-Rank\n\nThis is the official PyTorch implementation for pruning Large Language models using the LLM-Rank pruning method. It is based on the weighted PageRank centrality measure as introduced in our paper:\n\n\u003e **LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models** \\\n\u003e Amazon Web Services - AI Research \\\n\u003e Author: David B. Hoffmann \\\n\u003e Advisor: Dr. Kailash Budhathoki, Dr. Matthaeus Kleindessner \\\n\u003e Paper: https://arxiv.org/abs/2410.13299\n\n```bibtex\n@article{hoffmann2024llmrank,\n    title={LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models},\n    author={Hoffmann, David B. and Budhathoki, Kailash and Kleindessner, Matthaeus},\n    year={2024},\n    journal={arXiv preprint arXiv:2410.13299}\n}\n```\n\n## Setup\n\n1. Clone the repository with `git@github.com:amazon-science/llm-rank-pruning.git/`\n2. Navigate into the repository with `cd llm-rank-pruning`\n3. Install the `llmrank` module with `pip install -e .`\n\n## How To Use\n\nThe package provides everything needed to perform post training pruning of large language models. The **llmrank** package contains the actual pruning code of the LLM-Rank method as well as the other baselines we benchmark in our paper. It is designed to be easily extendible and compatible with further methods and scoring functions for structured pruning to allow for other comparisons. The extension to new models and methods is described in the [custom extensions](llmrank/README.md#include-other-pruning-methods) section.\n\nThe entry point to the llmrank package is the pipeline module which provides different pipelines for pruning and is documented in the [llmrank README](llmrank/README.md#how-to-use). To prune the feed forward layers of an open_llama_3b_v2 model using weighted PageRank centrality with C4 as calibration data, the following code can be used: \n\n```python\nfrom llmrank import ChainedFFNPipeline\nfrom llmrank.score import WeightedPageRankScorer\nfrom llmrank.prune import LocalPruner\nfrom llmrank.utils import load_hf_model\n\n# Load model\nmodel, tokenizer = load_hf_model(\"./artifacts/model/open_llama_3b_v2/\") \n\n# Define model structure\nstructure_dict = {\n    \"path_to_layers\": [\"model\", \"layers\"],\n    \"path_to_modules\": ( \n        (\"mlp\", \"up_proj\"),\n        (\"mlp\", \"down_proj\")\n    )\n}\nscorer = WeightedPageRankScorer()\npruner = LocalPruner(amount=0.3)\n\npipeline = ChainedFFNPipeline(model, tokenizer, structure_dict, scorer, pruner, \"cuda\")\npruned_model = pipeline.run()\n```\nThe **experiment** module acts as a CLI tool for running batched experiments. It takes a list of fixed and iterable arguments for different networks, scoring functions, pruning methods and pruning amounts and creates and runs the relevant pipeline for each experiment. For more details refer to the [experiment README](experiments/README.md). \n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## License\n\nThis project is licensed under the Apache-2.0 License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fllm-rank-pruning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Fllm-rank-pruning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fllm-rank-pruning/lists"}