An open API service indexing awesome lists of open source software.

https://github.com/joelburget/mamba-sae

Training and evaluating Sparse Autoencoders for Mamba
https://github.com/joelburget/mamba-sae

interpretability mamba

Last synced: 3 months ago
JSON representation

Training and evaluating Sparse Autoencoders for Mamba

Awesome Lists containing this project

README

          

# Mamba Interpretability

This repo is for doing interpretability work on [Mamba (Linear-Time Sequence Modeling with Selective State Spaces
)](https://arxiv.org/abs/2312.00752). We follow the approach from Anthropic's [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning](https://transformer-circuits.pub/2023/monosemantic-features/index.html), though the scope might broaden in the future.

We use SAELens for training and evaluating SAEs.

## `hyperparam_sweep.py`

Run a wandb sweep to determine hyperparameters.

```
> wandb sweep --project mamba-sae-sweep sweep_config.yaml
> wandb agent
```