{"id":30105669,"url":"https://github.com/chanind/feature-hedging-paper","last_synced_at":"2025-10-09T11:35:12.506Z","repository":{"id":294342155,"uuid":"983495952","full_name":"chanind/feature-hedging-paper","owner":"chanind","description":"Code for the paper \"Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders\"","archived":false,"fork":false,"pushed_at":"2025-09-24T19:01:43.000Z","size":322,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-24T21:07:03.224Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2505.11756","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chanind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-14T13:18:07.000Z","updated_at":"2025-09-24T19:01:47.000Z","dependencies_parsed_at":"2025-09-05T07:32:56.235Z","dependency_job_id":"e52eb565-1d15-47a8-a54e-7eac17c4e59d","html_url":"https://github.com/chanind/feature-hedging-paper","commit_stats":null,"previous_names":["chanind/feature-hedging-paper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chanind/feature-hedging-paper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanind%2Ffeature-hedging-paper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanind%2Ffeature-hedging-paper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanind%2Ffeature-hedging-paper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanind%2Ffeature-hedging-paper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chanind","download_url":"https://codeload.github.com/chanind/feature-hedging-paper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanind%2Ffeature-hedging-paper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001304,"owners_count":26083058,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-10T00:17:21.940Z","updated_at":"2025-10-09T11:35:12.487Z","avatar_url":"https://github.com/chanind.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Feature Hedging Paper\n\nCode for the paper [Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders](https://arxiv.org/abs/2505.11756).\n\n## Repo structure\n\nThis Repo contains the experiments run in the paper in `/experiments`, with toy model experiments in the `/notebooks` dir. Likely you don't want to directly run the experiments we did verbatim as its expensive to train so many SAEs, but if you do, the experiments with all hyperparams we used are there for reference. Each of these experiments require an `output_path` where trained SAEs and metrics will be saved, and a `shared_path`, which is just a folder that should the same for every experiment that gets run. This `shared_path` will be where common eval-specific data will be cached so it does not need to be recalculated for every new SAE that gets trained on a given LLM.\n\nPotentially more useful are the matryoshka SAE implementations in the `hedging_paper/saes` dir and the evaluations in the `hedging_paper/evals` dir. For running your own toy model experiments, see the examples in the `/notebooks` dir.\n\n## Setup\n\nThis project uses Poetry for dependency management. To install the dependencies, run:\n\n```bash\npoetry install\n```\n\n### Tests\n\nTo run the tests, run:\n\n```bash\npoetry run pytest\n```\n\n### Linting / Formatting\n\nThis project uses [Ruff](https://github.com/astral-sh/ruff) for linting and formatting. To set this up with VSCode, install the ruff plugina and add the following to `.vscode/settings.json`:\n\n```json\n{\n  \"[python]\": {\n    \"editor.formatOnSave\": true,\n    \"editor.codeActionsOnSave\": {\n      \"source.fixAll\": \"explicit\",\n      \"source.organizeImports\": \"explicit\"\n    },\n    \"editor.defaultFormatter\": \"charliermarsh.ruff\"\n  },\n  \"notebook.formatOnSave.enabled\": true\n}\n```\n\n### Pre-commit hook\n\nThere's a pre-commit hook that will run ruff and pyright on each commit. To install it, run:\n\n```bash\npoetry run pre-commit install\n```\n\n### Poetry tips\n\nBelow are some helpful tips for working with Poetry:\n\n- Install a new main dependency: `poetry add \u003cpackage\u003e`\n- Install a new development dependency: `poetry add --dev \u003cpackage\u003e`\n  - Development dependencies are not required for the main code to run, but are for things like linting/type-checking/etc...\n- Update the lockfile: `poetry lock`\n- Run a command using the virtual environment: `poetry run \u003ccommand\u003e`\n- Run a Python file from the CLI as a script (module-style): `poetry run python -m hedging_paper.path.to.file`\n\n### Citation\n\n```\n@article{chanin2025hedging,\n     title={Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders},\n     author={David Chanin and Tomáš Dulka and Adrià Garriga-Alonso},\n     year={2025},\n     journal={arXiv preprint arXiv:2505.11756}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchanind%2Ffeature-hedging-paper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchanind%2Ffeature-hedging-paper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchanind%2Ffeature-hedging-paper/lists"}