awesome-sparse-autoencoders

A resource repository of sparse autoencoders for large language models
https://github.com/chrisliu298/awesome-sparse-autoencoders

Last synced: 1 day ago
JSON representation

Papers
Blog Posts
- Extracting SAE task features for in-context learning — LessWrong
- Sparse Autoencoders Find Highly Interpretable Directions in Language Models
- Logan Riggs - ewart), [Robert_AIZI](https://www.lesswrong.com/users/robert_aizi)
- Self-explaining SAE features
- A primer on sparse autoencoders - by Nick Jiang
- An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability
- Adam Karvonen
- Finding Sparse Linear Connections between Features in LLMs
- Sparse Autoencoders: Future Work
- Dmitrii Kharpalenko - nanda-1?from=post_header), [Arthur Conmy](https://www.lesswrong.com/users/arthur-conmy?from=post_header)
- Logan Riggs Smith - ewart)

Programming Languages

Categories

Papers 18 Blog Posts 11

Sub Categories