https://github.com/chanind/feature-hedging-paper
Code for the paper "Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders"
https://github.com/chanind/feature-hedging-paper
Last synced: 8 months ago
JSON representation
Code for the paper "Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders"
- Host: GitHub
- URL: https://github.com/chanind/feature-hedging-paper
- Owner: chanind
- License: mit
- Created: 2025-05-14T13:18:07.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-09-24T19:01:43.000Z (9 months ago)
- Last Synced: 2025-09-24T21:07:03.224Z (9 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2505.11756
- Size: 314 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Feature Hedging Paper
Code for the paper [Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders](https://arxiv.org/abs/2505.11756).
## Repo structure
This Repo contains the experiments run in the paper in `/experiments`, with toy model experiments in the `/notebooks` dir. Likely you don't want to directly run the experiments we did verbatim as its expensive to train so many SAEs, but if you do, the experiments with all hyperparams we used are there for reference. Each of these experiments require an `output_path` where trained SAEs and metrics will be saved, and a `shared_path`, which is just a folder that should the same for every experiment that gets run. This `shared_path` will be where common eval-specific data will be cached so it does not need to be recalculated for every new SAE that gets trained on a given LLM.
Potentially more useful are the matryoshka SAE implementations in the `hedging_paper/saes` dir and the evaluations in the `hedging_paper/evals` dir. For running your own toy model experiments, see the examples in the `/notebooks` dir.
## Setup
This project uses Poetry for dependency management. To install the dependencies, run:
```bash
poetry install
```
### Tests
To run the tests, run:
```bash
poetry run pytest
```
### Linting / Formatting
This project uses [Ruff](https://github.com/astral-sh/ruff) for linting and formatting. To set this up with VSCode, install the ruff plugina and add the following to `.vscode/settings.json`:
```json
{
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll": "explicit",
"source.organizeImports": "explicit"
},
"editor.defaultFormatter": "charliermarsh.ruff"
},
"notebook.formatOnSave.enabled": true
}
```
### Pre-commit hook
There's a pre-commit hook that will run ruff and pyright on each commit. To install it, run:
```bash
poetry run pre-commit install
```
### Poetry tips
Below are some helpful tips for working with Poetry:
- Install a new main dependency: `poetry add `
- Install a new development dependency: `poetry add --dev `
- Development dependencies are not required for the main code to run, but are for things like linting/type-checking/etc...
- Update the lockfile: `poetry lock`
- Run a command using the virtual environment: `poetry run `
- Run a Python file from the CLI as a script (module-style): `poetry run python -m hedging_paper.path.to.file`
### Citation
```
@article{chanin2025hedging,
title={Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders},
author={David Chanin and Tomáš Dulka and Adrià Garriga-Alonso},
year={2025},
journal={arXiv preprint arXiv:2505.11756}
}
```