awesome-interpretability
Awesome tools for interpreting, manipulating the internals of of deep neural networks.
https://github.com/wassname/awesome-interpretability
Last synced: 15 days ago
JSON representation
-
Adapters
-
Explainability, counterfactuals and probing
-
Mechanistic interpretability
- To customize a model, instead of running it as a function, you run it as a "with" context. Inside "with" you can write regular pytorch to modify the computation.
- pyvene tries to be HuggingFace-native, supporting pre-defined interventions or customized interventions (below).
- an extremely opinionated toolkit for doing whatever you want to specific models,
-
Mechanistic interpretability libraries
- nnsight - team/nnsight?style=social)
- Pyvene (intervention focused)
- BauKit - light, simple, and well loved
- penzai - deepmind/penzai?style=social) - jax-based, not HuggingFace-native
- Transformer Debugger (OpenAI) - debugger?style=social) - not HuggingFace-native
- Graphpatch - lloyd/graphpatch?style=social) - promising but abandoned
- NeuroX
- A tutorial on doing it manually
- vgel/repeng - A library for making RepE control vectors
- cupbearer
- an extremely opinionated toolkit for doing whatever you want to specific models,
- To customize a model, instead of running it as a function, you run it as a "with" context. Inside "with" you can write regular pytorch to modify the computation.
- pyvene tries to be HuggingFace-native, supporting pre-defined interventions or customized interventions (below).
- TransformerLens - io/TransformerLens?style=social)
- Tuned Lens - lens?style=social) - tools for looking at how transformer predictions are built layer-by-layer
- ViT-Prisma - Multimodal/ViT-Prisma?style=social) - mechanistic interpretability for vision and video transformers
- Overcomplete - vision SAE toolbox
- vLLM-Hook - Hook?style=social) - program internal states of vLLM-served models
- vllm-lens - lens?style=social) - extract residual stream activations and apply steering vectors in vLLM
- Docent - interactive model explanation and steering interface
-
See more
-
Structured output
- jsonformer
- Microsoft Guidance
- lmql.ai
- llama.cpp grammar
- langchain output_parsers
- salute - typescript
- clownfish - 2023 Modifying Transformers to Follow a JSON Schema - not updated
- relm - 2023 Regular Expression engine for Language Models - not updated
- Constrained-Text-Generation-Studio
- kor
- lm-format-enforcer - remote api's
- Promptify
- prob_jsonformer - Jsonformer, but it can output the probability of each choice in a single pass. Has enum
- llama.cpp grammar
- outlines
- guardrails
- TypeChat - typescript
- instructor - for remote api's without logits
Programming Languages
Categories
Sub Categories
Keywords
machine-learning
8
interpretability
8
deep-learning
5
explainable-ai
5
llm
4
pytorch
4
transformers
4
python
3
natural-language-processing
3
large-language-models
3
explainable-ml
3
ai
3
feature-importance
2
language-model
2
mechanistic-interpretability
2
natural-language
2
xai
2
nlp
2
openai
2
artificial-intelligence
2
neural-networks
2
language-generation
1
huggingface
1
sequence-to-sequence
1
feature-attribution
1
interpretable-ai
1
interpretable-ml
1
codait
1
explainabil
1
ibm-research
1
ibm-research-ai
1
trusted-ai
1
generative-ai
1
captum
1
attribution-methods
1
visualization
1
jax
1
fine-tuning
1
xai-library
1
upsampling
1
ml
1
machine-learning-explainability
1
imbalance
1
explainability
1
evaluation
1
downsampling
1
bias-evaluation
1
bias
1
responsible-ai
1
robustness
1