{"id":19428882,"url":"https://github.com/EleutherAI/sparsify","last_synced_at":"2025-10-18T05:30:45.098Z","repository":{"id":241641464,"uuid":"805226033","full_name":"EleutherAI/sae","owner":"EleutherAI","description":"Sparse autoencoders","archived":false,"fork":false,"pushed_at":"2025-01-21T06:35:35.000Z","size":112,"stargazers_count":414,"open_issues_count":7,"forks_count":54,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-01-27T04:24:24.888Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EleutherAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-24T06:22:26.000Z","updated_at":"2025-01-26T22:55:15.000Z","dependencies_parsed_at":"2024-07-15T17:19:38.450Z","dependency_job_id":"cfb3e7f6-8ff8-4abf-8fc7-30afcea9299c","html_url":"https://github.com/EleutherAI/sae","commit_stats":null,"previous_names":["eleutherai/sae"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fsae","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fsae/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fsae/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fsae/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EleutherAI","download_url":"https://codeload.github.com/EleutherAI/sae/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236904725,"owners_count":19223157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T14:17:04.728Z","updated_at":"2025-10-18T05:30:45.092Z","avatar_url":"https://github.com/EleutherAI.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"## Introduction\nThis library trains _k_-sparse autoencoders (SAEs) and transcoders on the activations of HuggingFace language models, roughly following the recipe detailed in [Scaling and evaluating sparse autoencoders](https://arxiv.org/abs/2406.04093v1) (Gao et al. 2024).\n\nThis is a lean, simple library with few configuration options. Unlike most other SAE libraries (e.g. [SAELens](https://github.com/jbloomAus/SAELens)), it does not cache activations on disk, but rather computes them on-the-fly. This allows us to scale to very large models and datasets with zero storage overhead, but has the downside that trying different hyperparameters for the same model and dataset will be slower than if we cached activations (since activations will be re-computed). We may add caching as an option in the future.\n\nFollowing Gao et al., we use a TopK activation function which directly enforces a desired level of sparsity in the activations. This is in contrast to other libraries which use an L1 penalty in the loss function. We believe TopK is a Pareto improvement over the L1 approach, and hence do not plan on supporting it.\n\n## Loading pretrained SAEs\n\nTo load a pretrained SAE from the HuggingFace Hub, you can use the `Sae.load_from_hub` method as follows:\n\n```python\nfrom sparsify import Sae\n\nsae = Sae.load_from_hub(\"EleutherAI/sae-llama-3-8b-32x\", hookpoint=\"layers.10\")\n```\n\nThis will load the SAE for residual stream layer 10 of Llama 3 8B, which was trained with an expansion factor of 32. You can also load the SAEs for all layers at once using `Sae.load_many`:\n\n```python\nsaes = Sae.load_many(\"EleutherAI/sae-llama-3-8b-32x\")\nsaes[\"layers.10\"]\n```\n\nThe dictionary returned by `load_many` is guaranteed to be [naturally sorted](https://en.wikipedia.org/wiki/Natural_sort_order) by the name of the hook point. For the common case where the hook points are named `embed_tokens`, `layers.0`, ..., `layers.n`, this means that the SAEs will be sorted by layer number. We can then gather the SAE activations for a model forward pass as follows:\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nimport torch\n\ntokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\ninputs = tokenizer(\"Hello, world!\", return_tensors=\"pt\")\n\nwith torch.inference_mode():\n    model = AutoModelForCausalLM.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\n    outputs = model(**inputs, output_hidden_states=True)\n\n    latent_acts = []\n    for sae, hidden_state in zip(saes.values(), outputs.hidden_states):\n        # (N, D) input shape expected\n        hidden_state = hidden_state.flatten(0, 1)\n        latent_acts.append(sae.encode(hidden_state))\n\n# Do stuff with the latent activations\n```\n\nFor use cases beyond collecting residual stream SAE activations, we recommend PyTorch hooks ([see examples](https://gist.github.com/luciaquirke/7105708dac0cfc632d68f33c79b59e5c).)\n\n## Training SAEs and transcoders\n\nTo train SAEs from the command line, you can use the following command:\n\n```bash\npython -m sparsify EleutherAI/pythia-160m [optional dataset] [--transcode]\n```\nBy default, we use the `EleutherAI/SmolLM2-135M-10B` dataset for training, but you can use any dataset from the HuggingFace Hub, or any local dataset in HuggingFace format (the string is passed to `load_dataset` from the `datasets` library).\n\nThe CLI supports all of the config options provided by the `TrainConfig` class. You can see them by running `python -m sparsify --help`.\n\nProgrammatic usage is simple. Here is an example:\n\n```python\nimport torch\nfrom datasets import load_dataset\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nfrom sparsify import SaeConfig, Trainer, TrainConfig\nfrom sparsify.data import chunk_and_tokenize\n\nMODEL = \"HuggingFaceTB/SmolLM2-135M\"\ndataset = load_dataset(\n    \"EleutherAI/SmolLM2-135M-10B\", split=\"train\",\n)\ntokenizer = AutoTokenizer.from_pretrained(MODEL)\ntokenized = chunk_and_tokenize(dataset, tokenizer)\n\n\ngpt = AutoModelForCausalLM.from_pretrained(\n    MODEL,\n    device_map={\"\": \"cuda\"},\n    torch_dtype=torch.bfloat16,\n)\n\ncfg = TrainConfig(SaeConfig(), batch_size=16)\ntrainer = Trainer(cfg, tokenized, gpt)\n\ntrainer.fit()\n```\n\n## Finetuning SAEs\n\nTo finetune a pretrained SAE, pass its path to the `finetune` argument.\n\n```bash\npython -m sparsify EleutherAI/pythia-160m togethercomputer/RedPajama-Data-1T-Sample --finetune EleutherAI/sae-pythia-160m-32x\n```\n\n## Custom hookpoints\n\nBy default, the SAEs are trained on the residual stream activations of the model. However, you can also train SAEs on the activations of any other submodule(s) by specifying custom hookpoint patterns. These patterns are like standard PyTorch module names (e.g. `h.0.ln_1`) but also allow [Unix pattern matching syntax](https://docs.python.org/3/library/fnmatch.html), including wildcards and character sets. For example, to train SAEs on the output of every attention module and the inner activations of every MLP in GPT-2, you can use the following code:\n\n```bash\npython -m sparsify gpt2 --hookpoints \"h.*.attn\" \"h.*.mlp.act\"\n```\n\nTo restrict to the first three layers:\n\n```bash\npython -m sparsify gpt2 --hookpoints \"h.[012].attn\" \"h.[012].mlp.act\"\n```\n\nWe currently don't support fine-grained manual control over the learning rate, number of latents, or other hyperparameters on a hookpoint-by-hookpoint basis. By default, the `expansion_factor` option is used to select the appropriate number of latents for each hookpoint based on the width of that hookpoint's output. The default learning rate for each hookpoint is then set using an inverse square root scaling law based on the number of latents. If you manually set the number of latents or the learning rate, it will be applied to all hookpoints.\n\n## Distributed training\n\nWe support distributed training via PyTorch's `torchrun` command. By default we use the Distributed Data Parallel method, which means that the weights of each SAE are replicated on every GPU.\n\n```bash\ntorchrun --nproc_per_node gpu -m sparsify meta-llama/Meta-Llama-3-8B --batch_size 1 --layers 16 24 --k 192 --grad_acc_steps 8 --ctx_len 2048\n```\n\nThis is simple, but very memory inefficient. If you want to train SAEs for many layers of a model, we recommend using the `--distribute_modules` flag, which allocates the SAEs for different layers to different GPUs. Currently, we require that the number of GPUs evenly divides the number of layers you're training SAEs for.\n\n```bash\ntorchrun --nproc_per_node gpu -m sparsify meta-llama/Meta-Llama-3-8B --distribute_modules --batch_size 1 --layer_stride 2 --grad_acc_steps 8 --ctx_len 2048 --k 192 --load_in_8bit --micro_acc_steps 2\n```\n\nThe above command trains an SAE for every _even_ layer of Llama 3 8B, using all available GPUs. It accumulates gradients over 8 minibatches, and splits each minibatch into 2 microbatches before feeding them into the SAE encoder, thus saving a lot of memory. It also loads the model in 8-bit precision using `bitsandbytes`. This command requires no more than 48GB of memory per GPU on an 8 GPU node.\n\n## TODO\n\nThere are several features that we'd like to add in the near future:\n- [ ] Support for caching activations\n- [ ] Evaluate SAEs with KL divergence when grafted into the model\n\nIf you'd like to help out with any of these, please feel free to open a PR! You can collaborate with us in the sparse-autoencoders channel of the EleutherAI Discord.\n\n## Installation\n\n`pip install eai-sparsify`\n\n## Development\n\nRun `pip install -e .[dev]` from the sparsify directory.\n\nWe use [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/) for releases.\n\n## Experimental features\n\nLinear k decay schedule:\n\n```bash python -m sparsify gpt2 --hookpoints \"h.*.attn\" \"h.*.mlp.act\" --k_decay_steps 10_000```\n\nGroupMax activation function:\n\n```bash python -m sparsify gpt2 --hookpoints \"h.*.attn\" \"h.*.mlp.act\" --activation groupmax```\n\nEnd-to-end training:\n\n```bash python -m sparsify gpt2 --hookpoints \"h.*.attn\" \"h.*.mlp.act\" --loss_fn ce```\n\nor\n\n```bash python -m sparsify gpt2 --hookpoints \"h.*.attn\" \"h.*.mlp.act\" --loss_fn kl```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEleutherAI%2Fsparsify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEleutherAI%2Fsparsify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEleutherAI%2Fsparsify/lists"}