Projects in Awesome Lists tagged with sparse-autoencoder
A curated list of projects in awesome lists tagged with sparse-autoencoder .
https://github.com/paulpauls/llama3_interpretability_sae
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
feature-extraction feature-steering llama3 llm-interpretability open-research pytorch sparse-autoencoder
Last synced: 12 Jul 2025
https://github.com/explanare/ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
causal-intervention disentangled-representations interpretability intervention probing sparse-autoencoder
Last synced: 10 Mar 2026
https://github.com/glami/sansa
SANSA - sparse EASE for millions of items
approximate-inverse collaborative-filtering recommender-system sparse-autoencoder sparse-matrix
Last synced: 02 Apr 2026
https://github.com/recombee/compressae
Sparse Embedding Compression for Scalable Retrieval in Recommender Systems
embedding-compression recommender-systems sae similarity-search sparse-autoencoder
Last synced: 04 Apr 2026
https://github.com/mrquincle/keras-adversarial-autoencoders
Experiments with Adversarial Autoencoders using Keras
adversarial-autoencoder autoencoder jupyter keras sparse-autoencoder variational-autoencoder
Last synced: 18 Jun 2025
https://github.com/butanium/tiny-activation-dashboard
A tiny easily hackable implementation of a feature dashboard.
feature-dashboard feature-visualization sparse-autoencoder sparse-autoencoders
Last synced: 03 May 2026
https://github.com/zer0int/clip-sae-finetune
Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.
adversarial-attacks adversarial-learning clip fine-tune fine-tuning finetune sae sparse-autoencoder vision-transformer vit
Last synced: 03 Jul 2025
https://github.com/aarnphm/morph
exploration WYSIWYG editor
capstone-project experimental interface mechanistic-interpretability sparse-autoencoder
Last synced: 12 Apr 2025
https://github.com/codelion/pts
Pivotal Token Search
dataset-generation direct-preference-optimization dpo llm llm-inference llm-steering mech-interp phi-4 phi-4-mini phi4 phi4-mini pivotal-token-search pivotal-tokens reasoning-agent reasoning-language-models reasoning-models sae sparse-autoencoder steering-vector tokens
Last synced: 10 Jun 2025
https://github.com/seonglae/emgsd-hermes
Steering GPT2-EMGSD less biased & Generating stereotyped text with vanilla GPT2 without fine tuning or prompt engineering
bias-correction bias-mitigation emgsd gpt2 sparse-autoencoder steering-vector stereotype
Last synced: 19 Feb 2026
https://github.com/hinanohart/circuitbench
Integrated mechanistic interpretability + sparse autoencoder framework for Hybrid SSM-Attention models (Mamba-2, Hymba, RWKV-7). v0.1.2 alpha: real forward-pass intervention + mean-ablation patching shipped, CPU smoke; GPU/real adapters in v0.2.
alignment hymba interpretability mamba mamba-2 mechanistic-interpretability pytorch rwkv sae sparse-autoencoder ssm state-space-model transformer-alternatives
Last synced: 15 Jun 2026