{"id":19382861,"url":"https://github.com/locuslab/massive-activations","last_synced_at":"2025-04-23T20:32:32.982Z","repository":{"id":224861574,"uuid":"753428525","full_name":"locuslab/massive-activations","owner":"locuslab","description":"Code accompanying the paper \"Massive Activations in Large Language Models\"","archived":false,"fork":false,"pushed_at":"2024-03-04T15:39:01.000Z","size":2224,"stargazers_count":151,"open_issues_count":5,"forks_count":9,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T20:11:27.110Z","etag":null,"topics":["large-language-models"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2402.17762","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/locuslab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-06T05:11:58.000Z","updated_at":"2025-03-28T12:41:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"7b0ee672-2f6a-4d27-b2f8-b4d2a2436b83","html_url":"https://github.com/locuslab/massive-activations","commit_stats":null,"previous_names":["locuslab/massive-activations"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fmassive-activations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fmassive-activations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fmassive-activations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fmassive-activations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/locuslab","download_url":"https://codeload.github.com/locuslab/massive-activations/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250509869,"owners_count":21442514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["large-language-models"],"created_at":"2024-11-10T09:23:37.728Z","updated_at":"2025-04-23T20:32:32.976Z","avatar_url":"https://github.com/locuslab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Massive Activations in Large Language Models\n\nOfficial PyTorch implementation of our paper:\n\n**Massive Activations in Large Language Models** \u003c/br\u003e\n[Mingjie Sun](https://eric-mingjie.github.io/), [Xinlei Chen](https://xinleic.xyz/), [J. Zico Kolter](https://zicokolter.com/), [Zhuang Liu](https://liuzhuang13.github.io/) \u003cbr\u003e\nCarnegie Mellon University, Meta AI Research and Bosch Center for AI  \u003cbr\u003e\n[Paper](https://arxiv.org/abs/2402.17762) - [Project page](https://eric-mingjie.github.io/massive-activations/index.html)\n\nMost of the experiments in this paper were done on one A6000 GPU.\n\n---\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/main_teaser.png\" width=100% height=100% \nclass=\"center\"\u003e\n\u003c/p\u003e\n\nThis paper studies the existence of *massive activations* in Large Language Models (LLMs). These activations have significantly larger magnitudes than other activations while on the other hand are extremely few in quantity.\n\n## This repository\n\n### Setup \nInstallation instructions can be found in [INSTALL.md](INSTALL.md).\n\n### Outline\nThe contents of this repository are as follows:\n\n* [lib](lib) contains the util function for loading models, plotting figures and evaluation.\n* [monkey_patch](monkey_patch) contains the code for monkey patching LLMs with custom forward function, with a goal of collecting internal activation and attention statistics.\n* [gpt-2](gpt-2) contains the code for training GPT-2 with explicit attention biases.\n* [main_llm.py](main_llm.py) contains the code for reproducing our experiments on LLMs.\n* [main_vit.py](main_vit.py) contains the code for reproducing our experiments on ViTs.\n\n### Large Language Models (LLMs)\n\n* We provide an example command to visualize a hidden state feature on the residual stream:\n```sh\nCUDA_VISIBLE_DEVICES=0 python main_llm.py --model llama2_7b \\\n    --exp1 --layer_id 2 \\\n    --savedir results/llm/3d_feat_vis/\n```\nRunning this command will visualize the output feature of layer 2 in LLaMA-2-7B, on the input prompt \"*Summer is warm. Winter is cold.\\n*\". The resulting visualizations are saved in `results/llm/3d_feat_vis/`.\n\nFor some LLMs, e.g., LLaMA2-7B, you need to set the argument `--access-token` in order to access the weights.\n\n* We provide an example command to visualize the layerwise top 3 largest activation magnitudes:\n```sh\nCUDA_VISIBLE_DEVICES=0 python main_llm.py --model llama2_7b \\\n    --exp2 \\\n    --savedir results/llm/layerwise/\n```\nRunning this command will visualize the per layer top activation magnitudes. The resulting visualizations are saved in `results/llm/layerwise`.\n\n* We provide an example command to run the intervention analysis:\n```sh\nCUDA_VISIBLE_DEVICES=0 python main_llm.py --model llama2_7b \\\n    --exp3 \\\n    --reset_type set_zero \\\n    --layer_id 2 \\\n    --savedir results/llm/intervention_analysis/\n```\nHere the argument `--reset_type` can be either `set_zero` or `set_mean`. This command will zero the massive activations in the output feature of layer 2 in LLaMA-2-7B. The evaluation results are saved in `results/llm/intervention_analysis`.\n\n* We provide an example command for attention visualization:\n```sh \nCUDA_VISIBLE_DEVICES=0 python main_llm.py --model llama2_7b \\\n    --exp4 \\\n    --layer_id 3 \\\n    --savedir results/llm/attn_vis/\n```\nRunning this command will visualize the attention logits (average over attention heads) in layer 3 of LLaMA-2-7B. The visualizations are saved in `results/llm/attn_vis/`.\n\n### Vision Transformers (ViTs)\n\n* We provide an example command for visualizing the activation magnitudes of the output feature of an intermediate layer:\n```sh\nCUDA_VISIBLE_DEVICES=0 python main_vit.py --model_family dinov2_reg --model_size giant \\\n    --exp1 \\\n    --layer_id 40 \\\n    --savedir results/vit/3d_feat_vis/\n``` \n\n* We provide an example command for visualizing the layer-wise largest activation magnitudes:\n```sh\nCUDA_VISIBLE_DEVICES=0 python main_vit.py --model_family dinov2_reg --model_size giant \\\n    --exp2 \\\n    --savedir results/vit/layerwise/\n```\n\n* For reproducing the results of `Fix-Reg-Mean` on [DINOv2-reg](https://arxiv.org/abs/2309.16588), run the following commands:\n```sh\nfor model_size in small base large giant \ndo \nCUDA_VISIBLE_DEVICES=0 python main_vit.py \\\n    --model_family dinov2_reg --model_size ${model_size} --exp3 \\\n    --reg_feat_mean assets/reg_feat_mean \\\n    --imagenet_dir [Path to ImageNet validation set] \\\n    --savedir results/vit/exp4/dinov2_reg_${model_size}\ndone\n```\nThe argument `--reg_feat_mean` corresponds to the directory containing the mean of the register features at all layers collected over 10k ImageNet training images with data augmentations.\n\nResults\n| DINOv2-reg       |   ViT-S  |   ViT-B  |  ViT-L | ViT-G |\n|------------------|----------|----------|--------|-------|\n| Original         |   81.9   |   84.8   |  86.3  |  87.0 |\n| `Fix-Reg-Mean`   |   81.7   |   85.0   |  86.2  |  87.0 |\n\n## License\nThis project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.\n\n## Reference \n```bibtex\n@article{sun2024massive,\n  title={Massive Activations in Large Language Models}, \n  author={Sun, Mingjie and Chen, Xinlei and Kolter, J. Zico and Liu, Zhuang},\n  year={2024},\n  journal={arXiv preprint arXiv:2402.17762}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fmassive-activations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flocuslab%2Fmassive-activations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fmassive-activations/lists"}