{"id":50873658,"url":"https://github.com/hinanohart/circuitbench","last_synced_at":"2026-06-15T07:31:14.625Z","repository":{"id":359680089,"uuid":"1247075071","full_name":"hinanohart/circuitbench","owner":"hinanohart","description":"Integrated mechanistic interpretability + sparse autoencoder framework for Hybrid SSM-Attention models (Mamba-2, Hymba, RWKV-7). v0.1.2 alpha: real forward-pass intervention + mean-ablation patching shipped, CPU smoke; GPU/real adapters in v0.2.","archived":false,"fork":false,"pushed_at":"2026-06-10T13:33:25.000Z","size":108,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T14:08:36.162Z","etag":null,"topics":["alignment","hymba","interpretability","mamba","mamba-2","mechanistic-interpretability","pytorch","rwkv","sae","sparse-autoencoder","ssm","state-space-model","transformer-alternatives"],"latest_commit_sha":null,"homepage":"https://github.com/hinanohart/circuitbench/releases/latest","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hinanohart.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-22T21:53:47.000Z","updated_at":"2026-06-10T13:34:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/hinanohart/circuitbench","commit_stats":null,"previous_names":["hinanohart/circuitbench"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/hinanohart/circuitbench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinanohart%2Fcircuitbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinanohart%2Fcircuitbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinanohart%2Fcircuitbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinanohart%2Fcircuitbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hinanohart","download_url":"https://codeload.github.com/hinanohart/circuitbench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hinanohart%2Fcircuitbench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34353189,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","hymba","interpretability","mamba","mamba-2","mechanistic-interpretability","pytorch","rwkv","sae","sparse-autoencoder","ssm","state-space-model","transformer-alternatives"],"created_at":"2026-06-15T07:31:14.089Z","updated_at":"2026-06-15T07:31:14.620Z","avatar_url":"https://github.com/hinanohart.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# circuitbench\n\n[![CI](https://github.com/hinanohart/circuitbench/actions/workflows/ci.yml/badge.svg)](https://github.com/hinanohart/circuitbench/actions/workflows/ci.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/)\n\n**Mechanistic interpretability + sparse autoencoder framework for Hybrid SSM-Attention models**, with first-class support for pure SSMs.\n\nWhere TransformerLens / SAELens dominate Transformer interpretability, circuitbench fills the gap for **post-Transformer architectures**: Mamba-2, Hymba, Jamba, Falcon-H1, RWKV-7.\n\n\u003e **v0.1.x scope.** v0.1.x ships the **API surface + CPU `MockSSMAdapter`** so the harness is end-to-end runnable without GPUs or model downloads. Real model weights, JumpReLU SAEs, and step-wise `h_t` patching land in **v0.2**. See [Status](#status) for the precise per-component split.\n\n---\n\n## What is this?\n\ncircuitbench is a research harness for understanding *how* state-space models (SSMs) compute. It provides four integrated operations over a common hook-point abstraction that maps onto Mamba-2's internal tensor sites:\n\n- **`load_model`** — adapter registry; v0.1.x ships `MockSSMAdapter` (CPU); real weights in v0.2\n- **`train_sae`** — TopK sparse autoencoders trained on SSM-specific hook points\n- **`extract_circuit`** — coarse layer-level mean-ablation activation patching\n- **`steer`** — additive feature-direction intervention during the forward pass\n\nThe same API surface works identically for the CPU mock (available now) and for real model weights (v0.2).\n\n---\n\n## Why circuitbench\n\nSSMs and hybrid SSM-attention models have grown into a serious alternative to pure Transformers, but the mechanistic interpretability tooling has not caught up. Existing libraries either:\n\n- Hard-bake Transformer-only assumptions (residual streams indexed by layer × position), or\n- Provide raw hooks without SAE training, circuit discovery, or steering glue.\n\ncircuitbench provides one integrated harness for all four.\n\n---\n\n## Install\n\nv0.1 is alpha and **not yet on PyPI** (planned for v0.2 with trusted publisher). Install from source:\n\n```bash\ngit clone https://github.com/hinanohart/circuitbench.git\ncd circuitbench\npip install -e .                       # core only (torch + numpy + einops + jaxtyping + pydantic)\npip install -e \".[ssm,hf]\"             # placeholders for v0.2: mamba-ssm + HF transformers (GPU)\npip install -e \".[sae]\"                # placeholder for v0.2: SAELens interop\npip install -e \".[dev]\"                # development (pytest, ruff, mypy)\n```\n\n`[ssm]` / `[hf]` / `[sae]` installs do **not yet** unlock real backends in v0.1.x — they are reserved so the install path stays stable across the v0.1 → v0.2 transition.\n\n---\n\n## Quick start\n\n```python\nfrom circuitbench import load_model, train_sae, extract_circuit, steer\n\n# v0.1.x ships a CPU-only MockSSMAdapter; real Mamba-2 / Hymba weights arrive\n# in v0.2 (need `mamba-ssm` + GPU). The API surface is identical either way.\nmodel = load_model(\"mock://mamba2-tiny\", hook_point=\"out_proj_in\")\nsae = train_sae(model, layer=1, k=32, expansion=8, tokens=2048, batch_size=64)\ncircuit = extract_circuit(model, prompt=\"Paris is the capital of\", target=\"France\")\nout = steer(model, prompt=\"Hello\", feature_id=42, strength=2.0, sae=sae, layer=1)\n\nprint(circuit.top_layers(n=3))           # [(layer, ablation effect), ...]\nprint(out.delta_norm)                    # L2 shift in final output under steering\n```\n\nSee [`examples/`](examples/):\n- `01_load.py`, `02_train_sae.py`, `03_steer.py` — runnable on CPU in seconds\n- `titans_hook.py` — v0.2 contrib stub (prints a marker; raises `NotImplementedError` when called)\n\n---\n\n## How it works\n\n### Hook points (SSM-specific)\n\ncircuitbench defines five hook sites that map onto Mamba-2's internal computation path. The data flow inside each SSM block is:\n\n```\nx → x_proj → split(u, z, s)\n              └── u → conv1d ──→ c\n                                 └── ssm(c, s) → ssm_y           [H3]\n                                                  └── gate(z)\n                                                      └── post_gate [H1] → out_proj → +x → output\n```\n\n| ID | Location | Shape | Capture | Additive Intervention | Substitution |\n|----|----------|-------|---------|-----------------------|--------------|\n| H1 | `out_proj_in` (post-gate, pre-projection) | `(B, L, d_inner)` | ✅ | ✅ | ✅ |\n| H2 | `x_proj` (gate/input/dt projection) | `(B, L, 2*d_inner + d_state)` | ✅ | ✅ | ✅ |\n| H3 | `ssm_y` (SSM output, pre-gate) | `(B, L, d_inner)` | ✅ | ✅ | ✅ |\n| H4 | `hidden_state_h` (the SSM state itself) | `(B, L, D, N)` | ✅ | ✅ | v0.2 |\n| H5 | `conv1d` (short-conv branch output) | `(B, L, d_inner)` | ✅ | ✅ | ✅ |\n\nDefault for SAE training: **H1** (post-gate, pre-projection) — analogous to a Transformer's residual stream input.\n\nH1 and H3 are **distinct tensors**: the gate `y * sigmoid(z)` sits between them.\n\n### Circuit extraction (v0.1.x)\n\nFor each candidate layer `L`, circuitbench:\n1. Runs a clean forward pass and captures the activation at `(L, hook_point)`.\n2. Replaces that activation with its sequence mean (mean-ablation) and re-runs the forward.\n3. Records `‖clean_output − ablated_output‖₂` as the layer's effect score.\n\nThe layer with the largest shift is the one the prompt depends on most.\n\nStep-wise `h_t` patching and target-logit projection are planned for v0.2.\n\n### Architecture\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/architecture.png\" alt=\"circuitbench architecture\" width=\"840\"\u003e\n\u003c/div\u003e\n\n---\n\n## Differentiation (design goals — implementation status)\n\n| Axis | Status |\n|------|--------|\n| **Hybrid head separation SAE** (Hymba / Jamba attention vs SSM heads trained separately) | design goal, v0.2 |\n| **`ssm_state` direct SAE** (SAE over `h_t ∈ (B, L, D, N)`) | capture shipped (mock); full SAE training v0.2 |\n| **State-propagation circuit** (step-wise patching for recurrent models) | coarse layer-level mean-ablation shipped; step-wise `h_t` v0.2 |\n| **RWKV-7 first-class** (loader + hook points for RWKV-7 Goose) | design goal, v0.2 |\n\n---\n\n## Status\n\n| Component | v0.1.x (shipped) | v0.2 (planned) |\n|-----------|------------------|----------------|\n| `load_model` (registry + `MockSSMAdapter` CPU backend) | ✅ shipped | + real Mamba-2 / Hymba / Jamba / Falcon-H1 / RWKV-7 weights |\n| `train_sae` (TopK, k=32, 8× expansion, decoder unit-norm) | ✅ shipped | + JumpReLU, dead-feature resample (200k step) |\n| `extract_circuit` (coarse layer-level **mean-ablation** patching) | ✅ shipped | + step-wise `h_t` patching, target-logit projection, hybrid head separation |\n| `steer` (additive feature intervention **during** forward) | ✅ shipped | + composable interventions, beam search |\n| HF Hub SAE distribution (SAELens-compatible) | planned | shipped |\n| Multi-Agent SAE | namespace reserved | shipped |\n| PyPI publish | install from source | shipped (trusted publisher) |\n| arXiv preprint (v0.1 harness paper) | deferred | shipped |\n\n---\n\n## Acknowledgments\n\n**Inspired by** (no runtime dependency in v0.1.x — these projects are *not* imported; `[sae]` extra reserves SAELens interop for v0.2):\n- [SAELens](https://github.com/jbloomAus/SAELens) — production SAE library; circuitbench's v0.2 will export SAEs in a SAELens-compatible format\n- [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) — hook-based interpretability primitives\n- [MambaLens](https://github.com/Phylliida/MambaLens) — early Mamba interpretability work\n- [mamba-ssm](https://github.com/state-spaces/mamba) — official Mamba/Mamba-2 reference implementation\n\n---\n\n## Related projects\n\nPart of [hinanohart](https://github.com/hinanohart)'s open-source portfolio:\n- [transduce](https://github.com/hinanohart/transduce) — composable transducer streams\n- [exitkit](https://github.com/hinanohart/exitkit) — Nozick closest-continuer model identity over PAM snapshots\n- [subjunctor](https://github.com/hinanohart/subjunctor) — Nozick-grounded LLM agent gate\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhinanohart%2Fcircuitbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhinanohart%2Fcircuitbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhinanohart%2Fcircuitbench/lists"}