https://github.com/butanium/llm-lang-agnostic
minimal code to reproduce results from Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
https://github.com/butanium/llm-lang-agnostic
llm mechanistic-interpretability research
Last synced: about 2 months ago
JSON representation
minimal code to reproduce results from Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
- Host: GitHub
- URL: https://github.com/butanium/llm-lang-agnostic
- Owner: Butanium
- Created: 2024-08-09T13:18:46.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-12-08T11:12:24.000Z (6 months ago)
- Last Synced: 2024-12-08T12:20:20.697Z (6 months ago)
- Topics: llm, mechanistic-interpretability, research
- Language: Jupyter Notebook
- Homepage:
- Size: 14.3 MB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This repo contains minimal code to reproduce results from [Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers](https://arxiv.org/abs/2411.08745)
previously published under the title [How Do Llamas Process Multilingual Text? A Latent Exploration through Activation Patching](https://openreview.net/forum?id=0ku2hIm4BS), spotlight at the ICML 2024 Mechanistic interpretability workshop.
# Setup
Create a python environment and `pip install -r requirements.txt`. We think the code probably work with `python>=3.7` but only tested it with `python=3.11.x`.If this happens to not work you can use the versions specified in `pip.freeze` which contains all the package we have installed with their version in our local `conda` environment.
# Reproducing results
```bash
chmod +x compute_results.sh
./compute_results.sh
```# Datasets used in the paper
The code to rebuild the datasets is located in the `build_datasets` folder but are provided in the repo because you'd need to ask for extra Babelnet API credits / ask for the local index (and make them run 👻) which you don't want to.## `word_translation.csv`
Those files are computed using the `main_translation_dataset` function of `build_dataset/build_bn_dataset.py`.For a lang $\ell$, we first generated a single word translation of `word_original`.
This translated word is the first of each list in the $\ell$ column of the csv.Then, using babelnet we find all meanings or "senses" related to this word, and then collect all the words or "lemma" that expresse those senses, for each language (including $\ell$)
**Disclaimer:** For some languages we didn't compute the `word_translation.csv` file so you can only use them as output language. If you need one of those, shoot us an email and we'll add it.
# Extra datasets that did not make it yet in the paper
## `synset_dataset.csv`
Those files are computed using the `main_synset_dataset` function of `build_dataset/build_bn_dataset.py`.We took 200 words from https://en.wiktionary.org/wiki/Appendix:Basic_English_word_list#Things_-_200_picturable_words and found their canonical concept or "synset" in babelnet.
## `cloze_dataset.csv`
Those files are computed using the `main_cloze_dataset` function of `build_dataset/build_bn_dataset.py`.We took the synsets from `synset_dataset.csv` and for each language we collected the different definitions available in babelnet.
The dataset contains several columns:
- `original_definitions` (`str`): the original definitions from babelnet
- `clozes` (`cloze: str, acceptable_sense: tuple[str]`): definitions where we replaced one of the sense by a placeholder `____`. The `acceptable_sense` is the list of senses that don't appear in the cloze.
- `clozes_with_start_of_word` (`cloze: str, acceptable_sense: tuple[str]`): same as `clozes` but we search both for the sense alone and for words that start with the sense.
- `definitions_wo_ref` (`str`): definitions without the reference to any sense.