https://github.com/aryamanarora/causalgym

CausalGym: Benchmarking causal interpretability methods on linguistic tasks
https://github.com/aryamanarora/causalgym
benchmark causality interpretability mechanistic-interpretability syntaxgym
Last synced: 2 months ago
JSON representation
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
Host: GitHub
URL: https://github.com/aryamanarora/causalgym
Owner: aryamanarora
Created: 2023-10-10T23:44:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-30T18:36:01.000Z (7 months ago)
Last Synced: 2025-03-26T10:48:04.551Z (3 months ago)
Topics: benchmark, causality, interpretability, mechanistic-interpretability, syntaxgym
Language: Python
Homepage: https://arxiv.org/abs/2402.12560
Size: 95.4 MB
Stars: 41
Watchers: 1
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        


    

# CausalGym



Aryaman Arora, Dan Jurafsky, and Christopher Potts. 2024. [CausalGym: Benchmarking causal interpretability methods on linguistic tasks](https://aclanthology.org/2024.acl-long.785/). In _Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)_, pages 14638–14663, Bangkok, Thailand. Association for Computational Linguistics.

*HuggingFace dataset*: [aryaman/causalgym](https://huggingface.co/datasets/aryaman/causalgym)

**CausalGym** is a benchmark for comparing the performance of causal interpretability methods on a variety of simple linguistic tasks taken from the SyntaxGym evaluation set ([Gauthier et al., 2020](https://aclanthology.org/2020.acl-demos.10/), [Hu et al., 2020](https://aclanthology.org/2020.acl-main.158/)) and converted into a format suitable for interventional interpretability.

This repository includes code for:

- Training DAS and all the other methods benchmarked in the paper, on every region, layer, and task for some model. This is sufficient for replicating all experiments in the paper (including hyperparameter sweeps and interpretability during training).

- Reproducing every plot in the paper.

- Template specifications for every task in the benchmark and utils for generating examples, tokenizing, generating non-overlapping train/test sets, and so on.

- Testing model outputs on the task templates; this was used to design the benchmark tasks.

You can also download the train/dev/test splits for each task as used in the paper via [HuggingFace](https://huggingface.co/datasets/aryaman/causalgym).

If you are having trouble getting anything running, do not hesitate to file an issue! We would love to help you benchmark your new method or help you replicate the results from our paper.

## Instructions

> [!IMPORTANT]

> The implementations in this repo are only for `GPTNeoX`-type language models (e.g. the `pythia` series) and will probably not work for other architectures without some modifications.

First install the requirements (a fresh environment is probably best):

```bash

pip install -r requirements.txt

```

### Training

To train every method, layer, region, and task for `pythia-70m` (results are logged to the directory `logs/das/`):

```bash

python test_all.py --model EleutherAI/pythia-70m

```

To do the same but with the dog-give control task used to compute selectivity:

```bash

python test_all.py --model EleutherAI/pythia-70m --manipulate dog-give

```

To run just the Preposing in PP extension:

```bash

python test_all.py --model EleutherAI/pythia-70m --datasets preposing_in_pp/preposing_in_pp preposing_in_pp/preposing_in_pp_embed_1

```

### Analysis + plots

Once you have run this for several models, you can create results tables (like those found in the appendix) with:

```bash

python plot.py --file logs/das/ --plot summary --metric odds --reload

```

This also caches intermediate results in csv file in the directory, so you don't need to use the `--reload` option again unless you need to recompute statistics.

To produce the causal tracing-style plots for all methods:

```bash

python plot.py --file logs/das/ --plot pos_all --metric odds

```

To visualize just runs from the Preposing in PP extension:

```bash

python plot.py --file logs/das/ --plot pos_all --metric odds --template_filename preposing_in_pp

```

You can also specify a subset of methods:

```bash

python plot.py --file logs/das/ --plot pos_t --metric odds --methods das vanilla probe

```

## Citation

Please cite the CausalGym publication:

```bibtex

@inproceedings{arora-etal-2024-causalgym,

    title = "{C}ausal{G}ym: Benchmarking causal interpretability methods on linguistic tasks",

    author = "Arora, Aryaman and Jurafsky, Dan and Potts, Christopher",

    editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek",

    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",

    month = aug,

    year = "2024",

    address = "Bangkok, Thailand",

    publisher = "Association for Computational Linguistics",

    url = "https://aclanthology.org/2024.acl-long.785",

    doi = "10.18653/v1/2024.acl-long.785",

    pages = "14638--14663"

}

```

Also cite the earlier SyntaxGym papers:

```bibtex

@inproceedings{gauthier-etal-2020-syntaxgym,

    title = "{S}yntax{G}ym: An Online Platform for Targeted Evaluation of Language Models",

    author = "Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger",

    editor = "Celikyilmaz, Asli and Wen, Tsung-Hsien",

    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",

    month = jul,

    year = "2020",

    address = "Online",

    publisher = "Association for Computational Linguistics",

    url = "https://aclanthology.org/2020.acl-demos.10",

    doi = "10.18653/v1/2020.acl-demos.10",

    pages = "70--76",

}

@inproceedings{hu-etal-2020-systematic,

    title = "A Systematic Assessment of Syntactic Generalization in Neural Language Models",

    author = "Hu, Jennifer and Gauthier, Jon and Qian, Peng and Wilcox, Ethan and Levy, Roger",

    editor = "Jurafsky, Dan and Chai, Joyce and Schluter, Natalie and Tetreault, Joel",

    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",

    month = jul,

    year = "2020",

    address = "Online",

    publisher = "Association for Computational Linguistics",

    url = "https://aclanthology.org/2020.acl-main.158",

    doi = "10.18653/v1/2020.acl-main.158",

    pages = "1725--1744",

}

```

## Task examples

| **Task** 
|:-------------------- 
| ***Agreement*** (4) 
| `agr_gender` 
| `agr_sv_num_subj-relc` 
| `agr_sv_num_obj-relc` 
| `agr_sv_num_pp` 
| ***Licensing*** (7) 
| `agr_refl_num_subj-relc` 
| `agr_refl_num_obj-relc` 
| `agr_refl_num_pp` 
| `npi_any_subj-relc` 
| `npi_any_obj-relc` 
| `npi_ever_subj-relc` 
| `npi_ever_obj-relc` 
| ***Garden path effects*** (6) 
| `garden_mvrr` 
| `garden_mvrr_mod` 
| `garden_npz_obj` 
| `garden_npz_obj_mod` 
| `garden_npz_v-trans` 
| `garden_npz_v-trans_mod` 
| ***Gross syntactic state*** (4) 
| `gss_subord` 
| `gss_subord_subj-relc` 
| `gss_subord_obj-relc` 
| `gss_subord_pp` 
| ***Long-distance 
| `cleft` 
| `cleft_mod` 
| `filler_gap_embed_3` 
| `filler_gap_embed_4` 
| `filler_gap_hierarchy` 
| `filler_gap_obj` 
| `filler_gap_pp` 
| `filler_gap_subj`

| **Example**                                                                                                                                                 | -----------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------| |                                                                                                                                                             | | \[**John**\]\[**Jane**\] walked because \[**he**\]\[**she**\]                                                                                               | | The \[**guard**\]\[**guards**\] that hated the manager \[**is**\]\[**are**\]                                                                                | | The \[**guard**\]\[**guards**\] that the customers hated \[**is**\]\[**are**\]                                                                              | | The \[**guard**\]\[**guards**\] behind the managers \[**is**\]\[**are**\]                                                                                   | |                                                                                                                                                             | | The \[**farmer**\]\[**farmers**\] that loved the actors embarrassed \[**himself**\]\[**themselves**\]                                                       | | The \[**farmer**\]\[**farmers**\] that the actors loved embarrassed \[**himself**\]\[**themselves**\]                                                       | | The \[**farmer**\]\[**farmers**\] behind the actors embarrassed \[**himself**\]\[**themselves**\]                                                           | | \[**No**\]\[**The**\] consultant that has helped the taxi driver has shown \[**any**\]\[**some**\]                                                          | | \[**No**\]\[**The**\] consultant that the taxi driver has helped has shown \[**any**\]\[**some**\]                                                          | | \[**No**\]\[**The**\] consultant that has helped the taxi driver has \[**ever**\]\[**never**\]                                                              | | \[**No**\]\[**The**\] consultant that the taxi driver has helped has \[**ever**\]\[**never**\]                                                              | |                                                                                                                                                             | | The infant \[**who was**\]\[**⌀**\] brought the sandwich from the kitchen \[**by**\]\[**.**\]                                                               | | The infant \[**who was**\]\[**⌀**\] brought the sandwich from the kitchen with a new microwave \[**by**\]\[**.**\]                                          | | While the students dressed \[**,**\]\[**⌀**\] the comedian \[**was**\]\[**for**\]                                                                           | | While the students dressed \[**,**\]\[**⌀**\] the comedian who told bad jokes \[**was**\]\[**for**\]                                                        | | As the criminal \[**slept**\]\[**shot**\] the woman \[**was**\]\[**for**\]                                                                                  | | As the criminal \[**slept**\]\[**shot**\] the woman who told bad jokes \[**was**\]\[**for**\]                                                               | |                                                                                                                                                             | | \[**While the**\]\[**The**\] lawyers lost the plans \[**they**\]\[**.**\]                                                                                   | | \[**While the**\]\[**The**\] lawyers who wore white lab jackets studied the book that described several advances in cancer therapy \[**,**\]\[**.**\]       | | \[**While the**\]\[**The**\] lawyers who the spy had contacted repeatedly studied the book that colleagues had written on cancer therapy \[**,**\]\[**.**\] | | \[**While the**\]\[**The**\] lawyers in a long white lab jacket studied the book about several recent advances in cancer therapy \[**,**\]\[**.**\]         | dependencies*** (8) |                                                                                                                                                             | | What the young man \[**did**\]\[**ate**\] was \[**make**\]\[**for**\]                                                                                       | | What the young man \[**did**\]\[**ate**\] after the ingredients had been bought from the store was \[**make**\]\[**for**\]                                  | | I know \[**that**\]\[**what**\] the mother said the friend remarked the park attendant reported your friend sent \[**him**\]\[**.**\]                       | | I know \[**that**\]\[**what**\] the mother said the friend remarked the park attendant reported the cop thinks your friend sent \[**him**\]\[**.**\]        | | The fact that the brother said \[**that**\]\[**who**\] the friend trusted \[**the**\]\[**was**\]                                                            | | I know \[**that**\]\[**what**\] the uncle grabbed \[**him**\]\[**.**\]                                                                                      | | I know \[**that**\]\[**what**\] the uncle grabbed food in front of \[**him**\]\[**.**\]                                                                     | | I know \[**that**\]\[**who**\] the uncle grabbed food in front of \[**him**\]\[**.**\]                                                                      |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aryamanarora/causalgym

Awesome Lists containing this project

README