Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kmeng01/rome
Locating and editing factual associations in GPT (NeurIPS 2022)
https://github.com/kmeng01/rome
gpt interpretability pytorch transformers
Last synced: 2 months ago
JSON representation
Locating and editing factual associations in GPT (NeurIPS 2022)
- Host: GitHub
- URL: https://github.com/kmeng01/rome
- Owner: kmeng01
- License: mit
- Created: 2022-02-11T00:40:23.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-20T05:32:37.000Z (9 months ago)
- Last Synced: 2024-08-01T16:29:55.095Z (5 months ago)
- Topics: gpt, interpretability, pytorch, transformers
- Language: Python
- Homepage: https://rome.baulab.info
- Size: 22.1 MB
- Stars: 531
- Watchers: 7
- Forks: 113
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- awesome-llm-interpretability - Rome - Locating and editing factual associations in GPT. (Table of Contents / LLM Interpretability Tools)
- awesome-ChatGPT-repositories - rome - Locating and editing factual associations in GPT (NeurIPS 2022) (Others)
- awesome-MLSecOps - rome
- StarryDivineSky - kmeng01/rome - 2 XL(1.5B)和EleutherAI的GPT-J(6B)。该库利用“因果追踪”技术来识别LLM中事实关联的来源,并使用“秩一模型编辑”技术来修改模型的行为,使其产生更准确的输出。用户可以通过简单的API接口来指定要修改的关联,并观察模型输出的变化。该库还提供了一套评估方法,用于评估不同编辑方法的效果。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
README
# Rank-One Model Editing (ROME)
This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only).
We currently support OpenAI's GPT-2 XL (1.5B) and EleutherAI's GPT-J (6B). The release of a 20B GPT-like model from EleutherAI is expected soon; we hope to support it ASAP.Feel free to open an issue if you find any problems; we are actively developing this repository and will monitor tickets closely.
[![Colab ROME Demo](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kmeng01/rome/blob/main/notebooks/rome.ipynb)
## Table of Contents
1. [Installation](#installation)
2. [Causal Tracing](#causal-tracing)
3. [Rank-One Model Editing (ROME)](#rank-one-model-editing-rome-1)
4. [CounterFact](#counterfact)
5. [Evaluation](#evaluation)
* [Running the Full Evaluation Suite](#running-the-full-evaluation-suite)
* [Integrating New Editing Methods](#integrating-new-editing-methods)
6. [How to Cite](#how-to-cite)## Installation
We recommend `conda` for managing Python, CUDA, and PyTorch-related dependencies, and `pip` for everything else. To get started, simply install `conda` and run:
```bash
./scripts/setup_conda.sh
```## Causal Tracing
[`notebooks/causal_trace.ipynb`](notebooks/causal_trace.ipynb) demonstrates Causal Tracing, which can be modified to apply tracing to the processing of any statement.
## Rank-One Model Editing (ROME)
[`notebooks/rome.ipynb`](notebooks/rome.ipynb) demonstrates ROME. The API is simple; one simply has to specify a *requested rewrite* of the following form:
```python
request = {
"prompt": "{} plays the sport of",
"subject": "LeBron James",
"target_new": {
"str": "football"
}
}
```Several similar examples are included in the notebook.
## CounterFact
Details coming soon!
## Evaluation
See [`baselines/`](baselines/) for a description of the available baselines.
### Running the Full Evaluation Suite
[`experiments/evaluate.py`](experiments/evaluate.py) can be used to evaluate any method in [`baselines/`](baselines/).
To get started (e.g. using ROME on GPT-2 XL), run:
```bash
python3 -m experiments.evaluate \
--alg_name=ROME \
--model_name=gpt2-xl \
--hparams_fname=gpt2-xl.json
```Results from each run are stored at `results//run_` in a specific format:
```bash
results/
|__ ROME/
|__ run_/
|__ params.json
|__ case_0.json
|__ case_1.json
|__ ...
|__ case_10000.json
```To summarize the results, you can use [`experiments/summarize.py`](experiments/summarize.py):
```bash
python3 -m experiments.summarize --dir_name=ROME --runs=run_
```Running `python3 -m experiments.evaluate -h` or `python3 -m experiments.summarize -h` provides details about command-line flags.
### Integrating New Editing Methods
Say you have a new method `X` and want to benchmark it on CounterFact. To integrate `X` with our runner:
- Subclass [`HyperParams`](util/hparams.py) into `XHyperParams` and specify all hyperparameter fields. See [`ROMEHyperParameters`](rome/rome_hparams.py) for an example implementation.
- Create a hyperparameters file at `hparams/X/gpt2-xl.json` and specify some default values. See [`hparams/ROME/gpt2-xl.json`](hparams/ROME/gpt2-xl.json) for an example.
- Define a function `apply_X_to_model` which accepts several parameters and returns (i) the rewritten model and (ii) the original weight values for parameters that were edited (in the dictionary format `{weight_name: original_weight_value}`). See [`rome/rome_main.py`](rome/rome_main.py) for an example.
- Add `X` to `ALG_DICT` in [`experiments/evaluate.py`](experiments/evaluate.py) by inserting the line `"X": (XHyperParams, apply_X_to_model)`.Finally, run the main scripts:
```bash
python3 -m experiments.evaluate \
--alg_name=X \
--model_name=gpt2-xl \
--hparams_fname=gpt2-xl.jsonpython3 -m experiments.summarize --dir_name=X --runs=run_
```### Note on Cross-Platform Compatibility
We currently only support methods that edit autoregressive HuggingFace models using the PyTorch backend. We are working on a set of general-purpose methods (usable on e.g. TensorFlow and without HuggingFace) that will be released soon.
## How to Cite
```bibtex
@article{meng2022locating,
title={Locating and Editing Factual Associations in {GPT}},
author={Kevin Meng and David Bau and Alex Andonian and Yonatan Belinkov},
journal={Advances in Neural Information Processing Systems},
volume={35},
year={2022}
}
```