{"id":13472352,"url":"https://github.com/ai-safety-foundation/sparse_autoencoder","last_synced_at":"2025-03-26T15:31:58.521Z","repository":{"id":203909091,"uuid":"710672651","full_name":"ai-safety-foundation/sparse_autoencoder","owner":"ai-safety-foundation","description":"Sparse Autoencoder for Mechanistic Interpretability","archived":false,"fork":false,"pushed_at":"2024-07-20T15:42:39.000Z","size":12595,"stargazers_count":184,"open_issues_count":14,"forks_count":39,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-10-30T04:13:44.226Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ai-safety-foundation.github.io/sparse_autoencoder/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ai-safety-foundation.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-27T07:37:15.000Z","updated_at":"2024-10-28T12:50:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"6d3db1d2-d175-4cbe-b54f-7641a8565ff7","html_url":"https://github.com/ai-safety-foundation/sparse_autoencoder","commit_stats":null,"previous_names":["alan-cooney/sparse_autoencoder","ai-safety-foundation/sparse_autoencoder"],"tags_count":34,"template":false,"template_full_name":"alan-cooney/transformer-lens-starter-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-safety-foundation%2Fsparse_autoencoder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-safety-foundation%2Fsparse_autoencoder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-safety-foundation%2Fsparse_autoencoder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ai-safety-foundation%2Fsparse_autoencoder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ai-safety-foundation","download_url":"https://codeload.github.com/ai-safety-foundation/sparse_autoencoder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245681423,"owners_count":20655189,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T16:00:53.997Z","updated_at":"2025-03-26T15:31:58.018Z","avatar_url":"https://github.com/ai-safety-foundation.png","language":"Python","funding_links":[],"categories":["Table of Contents"],"sub_categories":["LLM Interpretability Tools"],"readme":"# Sparse Autoencoder\n\n[![PyPI](https://img.shields.io/pypi/v/sparse_autoencoder?color=blue)](https://pypi.org/project/transformer-lens/)\n![PyPI -\nLicense](https://img.shields.io/pypi/l/sparse_autoencoder?color=blue) [![Checks](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/checks.yml/badge.svg)](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/checks.yml)\n[![Release](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/release.yml/badge.svg)](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/release.yml)\n\nA sparse autoencoder for mechanistic interpretability research.\n\n[![Read the Docs\nHere](https://img.shields.io/badge/-Read%20the%20Docs%20Here-blue?style=for-the-badge\u0026logo=Read-the-Docs\u0026logoColor=white\u0026link=https://ai-safety-foundation.github.io/sparse_autoencoder/)](https://ai-safety-foundation.github.io/sparse_autoencoder/)\n\nTrain a Sparse Autoencoder [in colab](https://colab.research.google.com/github/ai-safety-foundation/sparse_autoencoder/blob/main/docs/content/demo.ipynb), or install for your project:\n\n```shell\npip install sparse_autoencoder\n```\n\n## Features\n\nThis library contains:\n\n   1. **A sparse autoencoder model**, along with all the underlying PyTorch components you need to\n      customise and/or build your own:\n      - Encoder, constrained unit norm decoder and tied bias PyTorch modules in `autoencoder`.\n      - L1 and L2 loss modules in `loss`.\n      - Adam module with helper method to reset state in `optimizer`.\n   2. **Activations data generator** using TransformerLens, with the underlying steps in case you\n      want to customise the approach:\n      - Activation store options (in-memory or on disk) in `activation_store`.\n      - Hook to get the activations from TransformerLens in an efficient way in `source_model`.\n      - Source dataset (i.e. prompts to generate these activations) utils in `source_data`, that\n        stream data from HuggingFace and pre-process (tokenize \u0026 shuffle).\n   3. **Activation resampler** to help reduce the number of dead neurons.\n   4. **Metrics** that log at various stages of training (e.g. during training, resampling and\n      validation), and integrate with wandb.\n   5. **Training pipeline** that combines everything together, allowing you to run hyperparameter\n      sweeps and view progress on wandb.\n\n## Designed for Research\n\nThe library is designed to be modular. By default it takes the approach from [Towards\nMonosemanticity: Decomposing Language Models With Dictionary Learning\n](https://transformer-circuits.pub/2023/monosemantic-features/index.html), so you can pip install\nthe library and get started quickly. Then when you need to customise something, you can just extend\nthe class for that component (e.g. you can extend `SparseAutoencoder` if you want to customise the\nmodel, and then drop it back into the training pipeline. Every component is fully documented, so\nit's nice and easy to do this.\n\n## Demo\n\nCheck out the demo notebook [docs/content/demo.ipynb](https://github.com/ai-safety-foundation/sparse_autoencoder/blob/main/docs/content/demo.ipynb) for a guide to using this library.\n\n## Contributing\n\nThis project uses [Poetry](https://python-poetry.org) for dependency management, and\n[PoeThePoet](https://poethepoet.natn.io/installation.html) for scripts. After checking out the repo,\nwe recommend setting poetry's config to create the `.venv` in the root directory (note this is a\nglobal setting) and then installing with the dev and demos dependencies.\n\n```shell\npoetry config virtualenvs.in-project true\npoetry install --with dev,demos\n```\n\n### Checks\n\nFor a full list of available commands (e.g. `test` or `typecheck`), run this in your terminal\n(assumes the venv is active already).\n\n```shell\npoe\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-safety-foundation%2Fsparse_autoencoder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fai-safety-foundation%2Fsparse_autoencoder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-safety-foundation%2Fsparse_autoencoder/lists"}