awesome-interpretability

Awesome tools for interpreting, manipulating the internals of of deep neural networks.
https://github.com/wassname/awesome-interpretability

Last synced: 21 days ago
JSON representation

Explainability, counterfactuals and probing
Mechanistic interpretability libraries
- nnsight - team/nnsight?style=social)
- Pyvene (intervention focused)
- BauKit - light, simple, and well loved
- penzai - deepmind/penzai?style=social) - jax-based, not HuggingFace-native
- Transformer Debugger (OpenAI) - debugger?style=social) - not HuggingFace-native
- Graphpatch - lloyd/graphpatch?style=social) - promising but abandoned
- NeuroX
- A tutorial on doing it manually
- an extremely opinionated toolkit for doing whatever you want to specific models,
- To customize a model, instead of running it as a function, you run it as a "with" context. Inside "with" you can write regular pytorch to modify the computation.
- pyvene tries to be HuggingFace-native, supporting pre-defined interventions or customized interventions (below).
- TransformerLens - io/TransformerLens?style=social)
- vgel/repeng - A library for making RepE control vectors
- cupbearer
Mechanistic interpretability
Structured output
- jsonformer
- Microsoft Guidance
- lmql.ai
- llama.cpp grammar
- langchain output_parsers
- salute - typescript
- clownfish - 2023 Modifying Transformers to Follow a JSON Schema - not updated
- relm - 2023 Regular Expression engine for Language Models - not updated
- Constrained-Text-Generation-Studio
- kor
- lm-format-enforcer - remote api's
- Promptify
- prob_jsonformer - Jsonformer, but it can output the probablity of each choice in a single pass
- instructor - for remote api's without logits
- guardrails
See more
- s list that inspired this one
- the github interpretability topic

Programming Languages

Python 19 Jupyter Notebook 7 TypeScript 1 HTML 1

Categories

Structured output 15 Mechanistic interpretability libraries 14 Explainability, counterfactuals and probing 13 Mechanistic interpretability 3 See more 2

Sub Categories

Keywords

interpretability 7 machine-learning 7 explainable-ai 5 deep-learning 4 llm 4 pytorch 3 ai 3 python 3 large-language-models 3 natural-language-processing 3 transformers 3 explainable-ml 3 openai 3 artificial-intelligence 2 neural-networks 2 gpt-3 2 feature-importance 2 language-model 2 mechanistic-interpretability 2 nlp 2 xai 2 tabular-explainer 1 mimic-explainer 1 interpretable-models 1 explanationdashboard 1 explainer 1 activation-intervention 1 activation-patching 1 intervention 1 fine-tuning 1 jax 1 attribution-methods 1 captum 1 generative-ai 1 huggingface 1 language-generation 1 sequence-to-sequence 1 fairness 1 interpretable-machine-learning 1 responsible-ai 1 robustness 1 codait 1 explainabil 1 ibm-research 1 ibm-research-ai 1 trusted-ai 1 trusted-ml 1 information-extraction 1 natural-language 1 natural-language-understanding 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome-interpretability

Explainability, counterfactuals and probing

Mechanistic interpretability libraries

Mechanistic interpretability

Structured output

See more