{"id":35360144,"url":"https://github.com/IntelLabs/SCAP","last_synced_at":"2026-01-07T13:00:57.294Z","repository":{"id":292990160,"uuid":"874854286","full_name":"IntelLabs/SCAP","owner":"IntelLabs","description":"Statistical Calibrated Activation Pruning","archived":false,"fork":false,"pushed_at":"2025-05-13T05:46:27.000Z","size":30,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-13T06:31:46.284Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IntelLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-18T15:25:38.000Z","updated_at":"2025-05-13T05:46:30.000Z","dependencies_parsed_at":"2025-05-13T06:42:01.439Z","dependency_job_id":null,"html_url":"https://github.com/IntelLabs/SCAP","commit_stats":null,"previous_names":["intellabs/scap"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/IntelLabs/SCAP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FSCAP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FSCAP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FSCAP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FSCAP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IntelLabs","download_url":"https://codeload.github.com/IntelLabs/SCAP/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FSCAP/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28235497,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2026-01-07T02:00:05.975Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-02T00:01:15.704Z","updated_at":"2026-01-07T13:00:57.285Z","avatar_url":"https://github.com/IntelLabs.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["2024"],"readme":"# Statistical Calibrated Activation Pruning (SCAP)\n\nThis repo contains the reference codes for \"[Post-Training Statistical Calibration for Higher Activation Sparsity](https://arxiv.org/abs/2412.07174)\".\n\nIf you find our work useful in your research, please consider citing our paper:\n\n```bibtex\n@InProceedings{chua2024scap,\n  title     = {Post-Training Statistical Calibration for Higher Activation Sparsity},\n  author    = {Chua, Vui Seng and Pan, Yujie and Jain, Nilesh},\n  booktitle = {Proceedings of The 4th NeurIPS Efficient Natural Language and Speech Processing Workshop},\n  year      = {2024},\n  volume    = {262},\n  series    = {Proceedings of Machine Learning Research}\n}\n```\n\n## Abstract\n\nWe present Statistical Calibrated Activation Pruning (SCAP), a post-training activation pruning framework that (1) generalizes sparsification by input activations of Fully-Connected layers for generic and flexible application across Transformers, and (2) features a simple Mode-Centering technique to pre-calibrate activation distributions for maximizing post-training sparsity. Our results demonstrate robust Pareto efficiency compared to prior methods, translating to a 1.5× additional LLM decoding speedup against CATS at iso model quality. SCAP effectiveness is empirically verified across a wide range of models, including recent Transformer Decoders, MoE, Mamba2, Encoding Transformer, and pre-quantized models, highlighting its practicality and scalability.\n\n## Setup\n\nPlease follow the steps below.\n\n```bash\n# recommended python version: 3.10.13\npython -m venv ./scap_env\nsource ./scap_env/bin/activate\n\n# install torch\npip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cu121\n\n# install dependencies\npip install transformers==4.44.0 datasets==2.21.0 accelerate tqdm rich seaborn matplotlib wheel \\\n    git+https://github.com/EleutherAI/lm-evaluation-harness.git@906ef948dc8dbb4c84e1bb0f2861b1aba30ab533\n\n# install gemv kernel\npip install triton \"git+https://github.com/ScalingIntelligence/CATS.git@0bda7708b835f20c59f4dd59d3d32b0c5f2f6376#egg=flash_gemv\u0026subdirectory=flash_gemv\"\n```\n\n## Reproducing the results\n\n### 1. Run calibration\n\nGet the calibrated thresholds of SCAP for each model and sparsity config.\n\n```bash\nbash scripts/01.calibration.bash\n```\n\n_You can skip this calibration step, as we have already uploaded the following model configs in the repo._\n\n| Model ID                  | Config in the bash                         | Up/gate sparsity           | Down sparsity               |\n| ------------------------- | ------------------------------------------ | -------------------------- | --------------------------- |\n| meta-llama/Llama-2-7b-hf  | up,zero,0.35,gate,zero,0.35,down,zero,0.55 | 35% without mode centering | 55% without mode centering  |\n| mistralai/Mistral-7B-v0.1 | up,zero,0.3,gate,zero,0.3,down,zero,0.7    | 30% without mode centering | 70% without mode centering  |\n| mosaicml/mpt-7b           | down,kde,0.5                               | /                          | 50% with _kde peak_ as mode |\n| tiiuae/falcon-7b          | down,median,0.5                            | /                          | 50% with _median_ as mode   |\n\nThe resulting `calibrated_thresholds.json` file at `results/scap/` folder shows the mode and threshold for each FFN layer specified in the config.\n\n### 2. Evaluation on zero-shot tasks\n\nEvaluate the zero-shot tasks listed in the paper, i.e., _winogrande, piqa, sciq, hellaswag, boolq, arc_easy, arc_challenge_.\nResults are at `results/scap/` folder.\n\n```bash\nbash scripts/02.evaluate_zero_shot_tasks.bash\n```\n\nThe resulting `evaluation_results.json` file contains: (1) evaluation metrics for each task; (2) averaged actual input sparsity for each layer.\n\n### 3. Inference with sparse kernel\n\nWe show the actual inference of SCAP optimized models with the sparse GEMV kernel.\n\n```bash\nbash scripts/03.inference_demo.bash\n```\n\n## Acknowledgement\n\nThis work is built atop [CATS](https://github.com/ScalingIntelligence/CATS), which we believe also extends from [DejaVu](https://github.com/FMInference/DejaVu). Credits go to the original authors of these projects.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIntelLabs%2FSCAP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIntelLabs%2FSCAP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIntelLabs%2FSCAP/lists"}