Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ai-safety-foundation/sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
https://github.com/ai-safety-foundation/sparse_autoencoder
Last synced: 2 months ago
JSON representation
Sparse Autoencoder for Mechanistic Interpretability
- Host: GitHub
- URL: https://github.com/ai-safety-foundation/sparse_autoencoder
- Owner: ai-safety-foundation
- License: mit
- Created: 2023-10-27T07:37:15.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-20T15:42:39.000Z (6 months ago)
- Last Synced: 2024-10-04T21:17:12.381Z (3 months ago)
- Language: Python
- Homepage: https://ai-safety-foundation.github.io/sparse_autoencoder/
- Size: 12 MB
- Stars: 175
- Watchers: 4
- Forks: 38
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-llm-interpretability - Sparse Autoencoder - Sparse Autoencoder for Mechanistic Interpretability. (Table of Contents / LLM Interpretability Tools)
README
# Sparse Autoencoder
[![PyPI](https://img.shields.io/pypi/v/sparse_autoencoder?color=blue)](https://pypi.org/project/transformer-lens/)
![PyPI -
License](https://img.shields.io/pypi/l/sparse_autoencoder?color=blue) [![Checks](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/checks.yml/badge.svg)](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/checks.yml)
[![Release](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/release.yml/badge.svg)](https://github.com/alan-cooney/sparse_autoencoder/actions/workflows/release.yml)A sparse autoencoder for mechanistic interpretability research.
[![Read the Docs
Here](https://img.shields.io/badge/-Read%20the%20Docs%20Here-blue?style=for-the-badge&logo=Read-the-Docs&logoColor=white&link=https://ai-safety-foundation.github.io/sparse_autoencoder/)](https://ai-safety-foundation.github.io/sparse_autoencoder/)Train a Sparse Autoencoder [in colab](https://colab.research.google.com/github/ai-safety-foundation/sparse_autoencoder/blob/main/docs/content/demo.ipynb), or install for your project:
```shell
pip install sparse_autoencoder
```## Features
This library contains:
1. **A sparse autoencoder model**, along with all the underlying PyTorch components you need to
customise and/or build your own:
- Encoder, constrained unit norm decoder and tied bias PyTorch modules in `autoencoder`.
- L1 and L2 loss modules in `loss`.
- Adam module with helper method to reset state in `optimizer`.
2. **Activations data generator** using TransformerLens, with the underlying steps in case you
want to customise the approach:
- Activation store options (in-memory or on disk) in `activation_store`.
- Hook to get the activations from TransformerLens in an efficient way in `source_model`.
- Source dataset (i.e. prompts to generate these activations) utils in `source_data`, that
stream data from HuggingFace and pre-process (tokenize & shuffle).
3. **Activation resampler** to help reduce the number of dead neurons.
4. **Metrics** that log at various stages of training (e.g. during training, resampling and
validation), and integrate with wandb.
5. **Training pipeline** that combines everything together, allowing you to run hyperparameter
sweeps and view progress on wandb.## Designed for Research
The library is designed to be modular. By default it takes the approach from [Towards
Monosemanticity: Decomposing Language Models With Dictionary Learning
](https://transformer-circuits.pub/2023/monosemantic-features/index.html), so you can pip install
the library and get started quickly. Then when you need to customise something, you can just extend
the class for that component (e.g. you can extend `SparseAutoencoder` if you want to customise the
model, and then drop it back into the training pipeline. Every component is fully documented, so
it's nice and easy to do this.## Demo
Check out the demo notebook [docs/content/demo.ipynb](https://github.com/ai-safety-foundation/sparse_autoencoder/blob/main/docs/content/demo.ipynb) for a guide to using this library.
## Contributing
This project uses [Poetry](https://python-poetry.org) for dependency management, and
[PoeThePoet](https://poethepoet.natn.io/installation.html) for scripts. After checking out the repo,
we recommend setting poetry's config to create the `.venv` in the root directory (note this is a
global setting) and then installing with the dev and demos dependencies.```shell
poetry config virtualenvs.in-project true
poetry install --with dev,demos
```### Checks
For a full list of available commands (e.g. `test` or `typecheck`), run this in your terminal
(assumes the venv is active already).```shell
poe
```