https://github.com/chanind/feature-hedging-paper

Code for the paper "Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders"
https://github.com/chanind/feature-hedging-paper

Last synced: 8 months ago
JSON representation

Code for the paper "Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders"

Host: GitHub
URL: https://github.com/chanind/feature-hedging-paper
Owner: chanind
License: mit
Created: 2025-05-14T13:18:07.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-09-24T19:01:43.000Z (9 months ago)
Last Synced: 2025-09-24T21:07:03.224Z (9 months ago)
Language: Python
Homepage: https://arxiv.org/abs/2505.11756
Size: 314 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

README

# Feature Hedging Paper

Code for the paper [Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders](https://arxiv.org/abs/2505.11756).

## Repo structure

This Repo contains the experiments run in the paper in `/experiments`, with toy model experiments in the `/notebooks` dir. Likely you don't want to directly run the experiments we did verbatim as its expensive to train so many SAEs, but if you do, the experiments with all hyperparams we used are there for reference. Each of these experiments require an `output_path` where trained SAEs and metrics will be saved, and a `shared_path`, which is just a folder that should the same for every experiment that gets run. This `shared_path` will be where common eval-specific data will be cached so it does not need to be recalculated for every new SAE that gets trained on a given LLM.

Potentially more useful are the matryoshka SAE implementations in the `hedging_paper/saes` dir and the evaluations in the `hedging_paper/evals` dir. For running your own toy model experiments, see the examples in the `/notebooks` dir.

## Setup

This project uses Poetry for dependency management. To install the dependencies, run:

```bash
poetry install
```

### Tests

To run the tests, run:

```bash
poetry run pytest
```

### Linting / Formatting

This project uses [Ruff](https://github.com/astral-sh/ruff) for linting and formatting. To set this up with VSCode, install the ruff plugina and add the following to `.vscode/settings.json`:

```json
{
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll": "explicit",
"source.organizeImports": "explicit"
},
"editor.defaultFormatter": "charliermarsh.ruff"
},
"notebook.formatOnSave.enabled": true
}
```

### Pre-commit hook

There's a pre-commit hook that will run ruff and pyright on each commit. To install it, run:

```bash
poetry run pre-commit install
```

### Poetry tips

Below are some helpful tips for working with Poetry:

- Install a new main dependency: `poetry add `
- Install a new development dependency: `poetry add --dev `
- Development dependencies are not required for the main code to run, but are for things like linting/type-checking/etc...
- Update the lockfile: `poetry lock`
- Run a command using the virtual environment: `poetry run `
- Run a Python file from the CLI as a script (module-style): `poetry run python -m hedging_paper.path.to.file`

### Citation

```
@article{chanin2025hedging,
title={Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders},
author={David Chanin and Tomáš Dulka and Adrià Garriga-Alonso},
year={2025},
journal={arXiv preprint arXiv:2505.11756}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chanind/feature-hedging-paper

Awesome Lists containing this project

README