https://github.com/kmeng01/rome

Locating and editing factual associations in GPT (NeurIPS 2022)
https://github.com/kmeng01/rome

gpt interpretability pytorch transformers

Last synced: 4 months ago
JSON representation

Locating and editing factual associations in GPT (NeurIPS 2022)

Host: GitHub
URL: https://github.com/kmeng01/rome
Owner: kmeng01
License: mit
Created: 2022-02-11T00:40:23.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-04-20T05:32:37.000Z (about 1 year ago)
Last Synced: 2024-10-30T04:13:45.045Z (9 months ago)
Topics: gpt, interpretability, pytorch, transformers
Language: Python
Homepage: https://rome.baulab.info
Size: 22.1 MB
Stars: 569
Watchers: 7
Forks: 120
Open Issues: 23
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

awesome-llm-interpretability - Rome - Locating and editing factual associations in GPT. (Table of Contents / LLM Interpretability Tools)
awesome-ChatGPT-repositories - rome - Locating and editing factual associations in GPT (NeurIPS 2022) (Others)
awesome-MLSecOps - rome
StarryDivineSky - kmeng01/rome - 2 XL（1.5B）和EleutherAI的GPT-J（6B）。该库利用“因果追踪”技术来识别LLM中事实关联的来源，并使用“秩一模型编辑”技术来修改模型的行为，使其产生更准确的输出。用户可以通过简单的API接口来指定要修改的关联，并观察模型输出的变化。该库还提供了一套评估方法，用于评估不同编辑方法的效果。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

        # Rank-One Model Editing (ROME)

This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only).

We currently support OpenAI's GPT-2 XL (1.5B) and EleutherAI's GPT-J (6B). The release of a 20B GPT-like model from EleutherAI is expected soon; we hope to support it ASAP.

Feel free to open an issue if you find any problems; we are actively developing this repository and will monitor tickets closely.

[![Colab ROME Demo](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kmeng01/rome/blob/main/notebooks/rome.ipynb)



    



## Table of Contents

1. [Installation](#installation)

2. [Causal Tracing](#causal-tracing)

3. [Rank-One Model Editing (ROME)](#rank-one-model-editing-rome-1)

4. [CounterFact](#counterfact)

5. [Evaluation](#evaluation)

    * [Running the Full Evaluation Suite](#running-the-full-evaluation-suite)

    * [Integrating New Editing Methods](#integrating-new-editing-methods)

6. [How to Cite](#how-to-cite)

## Installation

We recommend `conda` for managing Python, CUDA, and PyTorch-related dependencies, and `pip` for everything else. To get started, simply install `conda` and run:

```bash

./scripts/setup_conda.sh

```

## Causal Tracing

[`notebooks/causal_trace.ipynb`](notebooks/causal_trace.ipynb) demonstrates Causal Tracing, which can be modified to apply tracing to the processing of any statement.



    



## Rank-One Model Editing (ROME)

[`notebooks/rome.ipynb`](notebooks/rome.ipynb) demonstrates ROME. The API is simple; one simply has to specify a *requested rewrite* of the following form:

```python

request = {

    "prompt": "{} plays the sport of",

    "subject": "LeBron James",

    "target_new": {

        "str": "football"

    }

}

```

Several similar examples are included in the notebook.

## CounterFact

Details coming soon!

## Evaluation

See [`baselines/`](baselines/) for a description of the available baselines.

### Running the Full Evaluation Suite

[`experiments/evaluate.py`](experiments/evaluate.py) can be used to evaluate any method in [`baselines/`](baselines/).

To get started (e.g. using ROME on GPT-2 XL), run:

```bash

python3 -m experiments.evaluate \

    --alg_name=ROME \

    --model_name=gpt2-xl \

    --hparams_fname=gpt2-xl.json

```

Results from each run are stored at `results//run_` in a specific format:

```bash

results/

|__ ROME/

    |__ run_/

        |__ params.json

        |__ case_0.json

        |__ case_1.json

        |__ ...

        |__ case_10000.json

```

To summarize the results, you can use [`experiments/summarize.py`](experiments/summarize.py):

```bash

python3 -m experiments.summarize --dir_name=ROME --runs=run_

```

Running `python3 -m experiments.evaluate -h` or `python3 -m experiments.summarize -h` provides details about command-line flags.

### Integrating New Editing Methods

Say you have a new method `X` and want to benchmark it on CounterFact. To integrate `X` with our runner:

- Subclass [`HyperParams`](util/hparams.py) into `XHyperParams` and specify all hyperparameter fields. See [`ROMEHyperParameters`](rome/rome_hparams.py) for an example implementation.

- Create a hyperparameters file at `hparams/X/gpt2-xl.json` and specify some default values. See [`hparams/ROME/gpt2-xl.json`](hparams/ROME/gpt2-xl.json) for an example.

- Define a function `apply_X_to_model` which accepts several parameters and returns (i) the rewritten model and (ii) the original weight values for parameters that were edited (in the dictionary format `{weight_name: original_weight_value}`). See [`rome/rome_main.py`](rome/rome_main.py) for an example.

- Add `X` to `ALG_DICT` in [`experiments/evaluate.py`](experiments/evaluate.py) by inserting the line `"X": (XHyperParams, apply_X_to_model)`.

Finally, run the main scripts:

```bash

python3 -m experiments.evaluate \

    --alg_name=X \

    --model_name=gpt2-xl \

    --hparams_fname=gpt2-xl.json

python3 -m experiments.summarize --dir_name=X --runs=run_

```

### Note on Cross-Platform Compatibility

We currently only support methods that edit autoregressive HuggingFace models using the PyTorch backend. We are working on a set of general-purpose methods (usable on e.g. TensorFlow and without HuggingFace) that will be released soon.

## How to Cite

```bibtex

@article{meng2022locating,

  title={Locating and Editing Factual Associations in {GPT}},

  author={Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov},

  journal={Advances in Neural Information Processing Systems},

  volume={35},

  year={2022}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kmeng01/rome

Awesome Lists containing this project

README