https://github.com/poppingtonic/transformer-visualization
Mechanistic Interpretability Tutorials, Results and research log as I learn from @neelnanda-io's wonderful Easy-Transformer
https://github.com/poppingtonic/transformer-visualization
gradio-interface interpretability-jam interpretable-ai transformers visualization
Last synced: about 1 year ago
JSON representation
Mechanistic Interpretability Tutorials, Results and research log as I learn from @neelnanda-io's wonderful Easy-Transformer
- Host: GitHub
- URL: https://github.com/poppingtonic/transformer-visualization
- Owner: poppingtonic
- Created: 2022-11-07T14:28:39.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-13T20:07:35.000Z (almost 3 years ago)
- Last Synced: 2023-09-14T11:04:39.460Z (over 2 years ago)
- Topics: gradio-interface, interpretability-jam, interpretable-ai, transformers, visualization
- Language: Jupyter Notebook
- Homepage:
- Size: 5.15 MB
- Stars: 3
- Watchers: 4
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Learning Mechanistic Interpretability on Transformers with EasyTransformer (now TransformerLens)
_by Brian Muhia_
Fahamu, Inc
This repository houses the beginnings of a tutorial on mechanistic interpretability for Transformer language models.
#### Pedagogy
So far, we have:
1. Published a usable visualiser for tokens, fashioned from the `Hacky Interactive Lexoscope` by Neel Nanda.
1. Written notes from rewriting `EasyTransformer_Demo.ipynb` by Neel, in order to learn the library and how to use it.
#### Output
1. Applied some tools and ideas in the demo towards observing induction heads in [`SOLU-8l-old`](https://transformer-circuits.pub/2022/solu/index.html), also trained by Neel.
1. Generated IOI-style datasets:
- pkl_ioi_data.pkl is 100000 rows of IOI sentences from `ABBA` templates, most of which use multi-token terms.
- https://huggingface.co/datasets/fahamu/ioi
+ mecha_ioi_26m.parquet is 26,010,000 rows of IOI sentences, mixing ABBA and BABA templates
+ mecha_ioi_200k.parquet is 200,000 rows of IOI sentences, mixing ABBA and BABA templates
All inspired by the paper _Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small_, from Redwood Research. We are not affiliated with Redwood Research, and release this dataset to contribute to the collective research effort behind understanding how Transformer language models perform this task.
###### With thanks and Acknowledgements:
- Esben Kran, Sabrina Zaki - for hosting the Interpretability Jam, which accelerated this work.
- Neel Nanda - for publishing TransformerLens and making public his research process. Wonderful gifts!