https://github.com/joelburget/mamba-sae
Training and evaluating Sparse Autoencoders for Mamba
https://github.com/joelburget/mamba-sae
interpretability mamba
Last synced: 3 months ago
JSON representation
Training and evaluating Sparse Autoencoders for Mamba
- Host: GitHub
- URL: https://github.com/joelburget/mamba-sae
- Owner: joelburget
- Created: 2023-12-26T16:10:00.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-08T05:06:41.000Z (10 months ago)
- Last Synced: 2025-07-20T07:24:28.003Z (3 months ago)
- Topics: interpretability, mamba
- Language: Python
- Homepage:
- Size: 63.5 KB
- Stars: 9
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Mamba Interpretability
This repo is for doing interpretability work on [Mamba (Linear-Time Sequence Modeling with Selective State Spaces
)](https://arxiv.org/abs/2312.00752). We follow the approach from Anthropic's [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning](https://transformer-circuits.pub/2023/monosemantic-features/index.html), though the scope might broaden in the future.We use SAELens for training and evaluating SAEs.
## `hyperparam_sweep.py`
Run a wandb sweep to determine hyperparameters.
```
> wandb sweep --project mamba-sae-sweep sweep_config.yaml
> wandb agent
```