{"id":19243808,"url":"https://github.com/openmoss/language-model-saes","last_synced_at":"2026-04-12T16:23:06.034Z","repository":{"id":233962330,"uuid":"774167861","full_name":"OpenMOSS/Language-Model-SAEs","owner":"OpenMOSS","description":"For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.","archived":false,"fork":false,"pushed_at":"2024-09-17T09:40:44.000Z","size":26185,"stargazers_count":30,"open_issues_count":3,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-09-17T12:14:33.153Z","etag":null,"topics":["interpretability","mechanistic-interpretability","sparse-autoencoders","sparse-dictionary"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMOSS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-19T04:04:31.000Z","updated_at":"2024-09-12T09:28:58.000Z","dependencies_parsed_at":"2024-07-06T13:27:38.586Z","dependency_job_id":"4b3fb696-129d-4cb9-afb9-b1145e3724ed","html_url":"https://github.com/OpenMOSS/Language-Model-SAEs","commit_stats":null,"previous_names":["openmoss/gpt2-dictionary","openmoss/language-model-saes"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLanguage-Model-SAEs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLanguage-Model-SAEs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLanguage-Model-SAEs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLanguage-Model-SAEs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMOSS","download_url":"https://codeload.github.com/OpenMOSS/Language-Model-SAEs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250032410,"owners_count":21363831,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["interpretability","mechanistic-interpretability","sparse-autoencoders","sparse-dictionary"],"created_at":"2024-11-09T17:20:07.846Z","updated_at":"2026-02-11T11:13:26.822Z","avatar_url":"https://github.com/OpenMOSS.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Language-Model-SAEs\n\n\u003e [!IMPORTANT]\n\u003e Currently the examples are outdated and some parallelism strategies are not working due to lack of bandwidth. We are working on better organizing recent updates and will make everything work ASAP.\n\n`Language-Model-SAEs` is a comprehensive, **fully-distributed** framework designed for **training, analyzing and visualizing Sparse Autoencoders (SAEs)**, empowering scalable and systematic **Mechanistic Interpretability** research.\n\n## News\n\n- 2025.9.23 We leverage **Crosscoder** to track feature evolution across pre-training snapshots. Link: [Evolution of Concepts in Language Model Pre-Training](https://www.arxiv.org/abs/2509.17196).\n\n- 2025.8.23 We identify a prevalent low-rank structure in attention outputs as the key cause of dead features, and propose **Active Subspace Initialization** to improve sparse dictionary learning on these low-rank activations. Link: [Attention Layers Add Into Low-Dimensional Residual Subspaces](https://arxiv.org/abs/2508.16929).\n\n- 2025.4.29 We introduce **Low-Rank Sparse Attention (Lorsa)** to attack attention superposition, extracting tens of thousands of true attention units from LLM attention layers. Link: [Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition](https://arxiv.org/abs/2504.20938).\n\n- 2024.10.29 We introduce **Llama Scope**, our first contribution to the open-source Sparse Autoencoder ecosystem. Stay tuned! Link: [Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders](http://arxiv.org/abs/2410.20526).\n\n- 2024.10.9 Transformers and Mambas are mechanistically similar in both feature and circuit level. Can we follow this line and find **universal motifs and fundamental differences between language model architectures**? Link: [Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures](https://arxiv.org/pdf/2410.06672).\n\n- 2024.5.22 We propose hierarchical tracing, a promising method to **scale up sparse feature circuit analysis** to industrial size language models! Link: [Automatically Identifying Local and Global Circuits with Linear Computation Graphs](https://arxiv.org/pdf/2405.13868).\n\n- 2024.2.19 Our first attempt on SAE-based circuit analysis for Othello-GPT leads us to **an example of Attention Superposition in the wild**! Link: [Dictionary learning improves patch-free circuit discovery in mechanistic interpretability: A case study on othello-gpt](https://arxiv.org/pdf/2402.12201).\n\n## Features\n\n- **Scalability**: Our framework is fully distributed with arbitrary combinations of data, model, and head parallelism for both training and analysis. Enjoy training SAEs with millions of features!\n- **Flexibility**: We support a wide range of SAE variants, including vanilla SAEs, Lorsa (Low-rank Sparse Attention), CLT (Cross-layer Transcoder), MoLT (Mixture of Linear Transforms), CrossCoder, and more. Each variant can be combined with different activation functions (e.g., ReLU, JumpReLU, TopK, BatchTopK) and sparsity penalties (e.g., L1, Tanh).\n- **Easy to Use**: We provide high-level `runners` APIs to quickly launch experiments with simple configurations. Check our [examples](examples) for verified hyperparameters.\n- **Visualization**: We provide a unified web interface to visualize learned SAE variants and their features.\n\n## Installation\n\nUse [pip](https://pypi.org/project/pip/) to install Language-Model-SAEs:\n\n```bash\npip install lm-saes==2.0.0b11\n```\n\nWe also highly recommend using [uv](https://docs.astral.sh/uv/) to manage your own project dependencies. You can use\n\n```bash\nuv add lm-saes==2.0.0b11\n```\n\nto add Language-Model-SAEs as your project dependency.\n\n## Development\n\nWe use [uv](https://docs.astral.sh/uv/) to manage the dependencies, which is an alternative to [poetry](https://python-poetry.org/) or [pdm](https://pdm-project.org/). To install the required packages, just install [uv](https://docs.astral.sh/uv/getting-started/installation/), and run the following command:\n\n```bash\nuv sync\n```\n\nThis will install all the required packages for the codebase in `.venv` directory. For Ascend NPU support, run\n\n```bash\nuv sync --extra npu\n```\n\nA forked version of `TransformerLens` is also included in the dependencies to provide the necessary tools for analyzing features.\n\nIf you want to use the visualization tools, you also need to install the required packages for the frontend, which uses [bun](https://bun.sh/) for dependency management. Follow the instructions on the website to install it, and then run the following command:\n\n```bash\ncd ui-ssr\nbun install\n```\n\n## Launch an Experiment\n\nExplore the `examples` to check the basic usage of training/analyzing SAEs in different configurations. Note a MongoDB is recommended for recording the model/dataset/SAE configurations and required for storing analyses. For more advanced usage, you may explore `src/lm_saes/runners` folder for the interface for generating activations and training \u0026 analyzing SAE variants, and directly write your own variant of training/analyzing script at the runner level.\n\n## Visualizing the Learned Dictionary\n\nThe analysis results will be saved using MongoDB, and you can use the provided visualization tools to visualize the learned dictionary. First, start the FastAPI server by running the following command:\n\n```bash\nuvicorn server.app:app --port 24577 --env-file server/.env\n```\n\nThen, copy the `ui/.env.example` file to `ui/.env` and modify the `BACKEND_URL` to fit your server settings (by default, it's `http://localhost:24577`), and start the frontend by running the following command:\n\n```bash\ncd ui\nbun dev --port 24576\n```\n\nThat's it! You can now go to `http://localhost:24576` to visualize the learned dictionary and its features.\n\n## Development\n\nWe highly welcome contributions to this project. If you have any questions or suggestions, feel free to open an issue or a pull request. We are looking forward to hearing from you!\n\nTODO: Add development guidelines\n\n## Acknowledgement\n\nThe design of the pipeline (including the configuration and some training details) is highly inspired by the [mats_sae_training\n](https://github.com/jbloomAus/mats_sae_training) project (now known as [SAELens](https://github.com/jbloomAus/SAELens)) and heavily relies on the [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) library. We thank the authors for their great work.\n\n## Citation\n\nPlease cite this library as:\n\n```\n@misc{Ge2024OpenMossSAEs,\n    title  = {OpenMoss Language Model Sparse Autoencoders},\n    author = {Xuyang Ge, Wentao Shu, Junxuan Wang, Guancheng Zhou, Jiaxing Wu, Fukang Zhu, Lingjie Chen, Zhengfu He},\n    url    = {https://github.com/OpenMOSS/Language-Model-SAEs},\n    year   = {2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Flanguage-model-saes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenmoss%2Flanguage-model-saes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Flanguage-model-saes/lists"}