https://github.com/poppingtonic/transformer-visualization

Mechanistic Interpretability Tutorials, Results and research log as I learn from @neelnanda-io's wonderful Easy-Transformer
https://github.com/poppingtonic/transformer-visualization

gradio-interface interpretability-jam interpretable-ai transformers visualization

Last synced: about 1 year ago
JSON representation

Mechanistic Interpretability Tutorials, Results and research log as I learn from @neelnanda-io's wonderful Easy-Transformer

Host: GitHub
URL: https://github.com/poppingtonic/transformer-visualization
Owner: poppingtonic
Created: 2022-11-07T14:28:39.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-09-13T20:07:35.000Z (almost 3 years ago)
Last Synced: 2023-09-14T11:04:39.460Z (over 2 years ago)
Topics: gradio-interface, interpretability-jam, interpretable-ai, transformers, visualization
Language: Jupyter Notebook
Homepage:
Size: 5.15 MB
Stars: 3
Watchers: 4
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Learning Mechanistic Interpretability on Transformers with EasyTransformer (now TransformerLens)

_by Brian Muhia_

Fahamu, Inc

This repository houses the beginnings of a tutorial on mechanistic interpretability for Transformer language models.

#### Pedagogy
So far, we have:
1. Published a usable visualiser for tokens, fashioned from the `Hacky Interactive Lexoscope` by Neel Nanda.
1. Written notes from rewriting `EasyTransformer_Demo.ipynb` by Neel, in order to learn the library and how to use it.

#### Output
1. Applied some tools and ideas in the demo towards observing induction heads in [`SOLU-8l-old`](https://transformer-circuits.pub/2022/solu/index.html), also trained by Neel.
1. Generated IOI-style datasets:
- pkl_ioi_data.pkl is 100000 rows of IOI sentences from `ABBA` templates, most of which use multi-token terms.
- https://huggingface.co/datasets/fahamu/ioi
+ mecha_ioi_26m.parquet is 26,010,000 rows of IOI sentences, mixing ABBA and BABA templates
+ mecha_ioi_200k.parquet is 200,000 rows of IOI sentences, mixing ABBA and BABA templates

All inspired by the paper _Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small_, from Redwood Research. We are not affiliated with Redwood Research, and release this dataset to contribute to the collective research effort behind understanding how Transformer language models perform this task.

###### With thanks and Acknowledgements:

- Esben Kran, Sabrina Zaki - for hosting the Interpretability Jam, which accelerated this work.
- Neel Nanda - for publishing TransformerLens and making public his research process. Wonderful gifts!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/poppingtonic/transformer-visualization

Awesome Lists containing this project

README