https://github.com/explanare/verbatim-memorization

Demystifying Verbatim Memorization in Large Language Models
https://github.com/explanare/verbatim-memorization

causal-intervention memorization unlearning

Last synced: 8 months ago
JSON representation

Demystifying Verbatim Memorization in Large Language Models

Host: GitHub
URL: https://github.com/explanare/verbatim-memorization
Owner: explanare
License: mit
Created: 2024-07-17T18:15:48.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-08-16T21:29:50.000Z (almost 2 years ago)
Last Synced: 2024-11-15T05:51:58.943Z (over 1 year ago)
Topics: causal-intervention, memorization, unlearning
Language: Python
Homepage: https://arxiv.org/abs/2407.17817
Size: 428 KB
Stars: 3
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Demystifying Verbatim Memorization in Large Language Models

:construction: Work in Progress :construction:

Verbatim memorization refers to LLMs outputting long sequences of texts that are exact matches of their training examples. In our work, we show that verbatim memorization is intertwined with the LM's general capabilities and thus will be very difficult to isolate and suppress without degrading model quality.

This repo contains:
* A framework to study verbatim memorization in a controlled setting by continuing pre-training from LLM checkpoints with injected sequences.
* Scripts using causal interventions to analyze how verbatim memorized sequences are encoded in the model representations.
* Stress testing evaluation for unlearning methods that aim to remove the verbatim memorized information.

## Data

The [data](https://github.com/explanare/verbatim-memorization/main/data) directory contains the following datasets:
* Pile data: 1M sequences sampled from the Pile, along with continuations generated by the `pythia-6.9b-deduped` model.
* Sequence injection data: 100 sequences sampled from Internet content published after Dec 2020.
* Stress testing data: 140K perturbed prefixes to evaluate whether unlearning methods truly remove the verbatim memorized information.

## Experiment

### Training with the Sequence Injection Framework

The pre-training data can be generated by the [`batch_viewer`](https://github.com/EleutherAI/pythia/blob/main/utils/batch_viewer.py) script, which allows you to extract Pythia training data between two given training steps.

The training script is at `scripts/train_with_injection.py`. For the single-shot verbatim memorization experiment, the training script is at `scripts/train_with_injection_single_shot.py`.

### Analyzing Causal Dependencies Between the Trigger and Verbatim Memorized Tokens

We use causal interventions to analyze the causal dependencies between the trigger and verbatim memorized tokens. You can find the script for causal dependency analysis on Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1FX8C-Rr1tSDjklaGjMvRghcbBSzpzzBS?usp=sharing)

Below is an example of a sequence verbatim memorized by `pythia-6.9b-deduped`, which is the first sentence of the book *Harry Potter and the Philosopher's Stone*. The trigger sequence is "Mr and Mrs Dursley, of", i.e., the model can generate the full sentence given only the trigger. Yet, not all generated tokens are actually causally dependent on the trigger, e.g., the prediction of the token "you" only depends on representations of the token "thank".

![Causal dependencies between the trigger and verbatim memorized tokens.](/figures/causal_dependencies.svg)

### Stress Testing Unlearning Methods

The evaluation scripts, including generating perturbed prefixes, are available below:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/19iQjGO37ifHtCM4KtH2fy99lbe2sY_vl?usp=sharing)

## Citation

If you find this repo helpful, please consider citing our work

```
@misc{huang2024demystifying,
title={Demystifying Verbatim Memorization in Large Language Models},
author={Jing Huang and Diyi Yang and Christopher Potts},
year={2024},
eprint={2407.17817},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.17817},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/explanare/verbatim-memorization

Awesome Lists containing this project

README