https://github.com/joelburget/mamba-sae

Training and evaluating Sparse Autoencoders for Mamba
https://github.com/joelburget/mamba-sae

interpretability mamba

Last synced: 3 months ago
JSON representation

Training and evaluating Sparse Autoencoders for Mamba

Host: GitHub
URL: https://github.com/joelburget/mamba-sae
Owner: joelburget
Created: 2023-12-26T16:10:00.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-12-08T05:06:41.000Z (10 months ago)
Last Synced: 2025-07-20T07:24:28.003Z (3 months ago)
Topics: interpretability, mamba
Language: Python
Homepage:
Size: 63.5 KB
Stars: 9
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Mamba Interpretability

This repo is for doing interpretability work on [Mamba (Linear-Time Sequence Modeling with Selective State Spaces
)](https://arxiv.org/abs/2312.00752). We follow the approach from Anthropic's [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning](https://transformer-circuits.pub/2023/monosemantic-features/index.html), though the scope might broaden in the future.

We use SAELens for training and evaluating SAEs.

## `hyperparam_sweep.py`

Run a wandb sweep to determine hyperparameters.

```
> wandb sweep --project mamba-sae-sweep sweep_config.yaml
> wandb agent
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/joelburget/mamba-sae

Awesome Lists containing this project

README