https://github.com/stanfordnlp/pyvene

Stanford NLP Python library for understanding and improving PyTorch models via interventions
https://github.com/stanfordnlp/pyvene

activation-intervention activation-patching interpretability intervention mechanistic-interpretability

Last synced: 7 months ago
JSON representation

Stanford NLP Python library for understanding and improving PyTorch models via interventions

Host: GitHub
URL: https://github.com/stanfordnlp/pyvene
Owner: stanfordnlp
License: apache-2.0
Created: 2023-02-06T23:35:24.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2025-04-30T20:07:04.000Z (7 months ago)
Last Synced: 2025-04-30T21:22:21.753Z (7 months ago)
Topics: activation-intervention, activation-patching, interpretability, intervention, mechanistic-interpretability
Language: Python
Homepage: http://pyvene.ai
Size: 25.4 MB
Stars: 738
Watchers: 8
Forks: 81
Open Issues: 20
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

awesome-interpretability - Pyvene (intervention focused)

README

Read our paper » | Read the docs »

# A Library for _Understanding_ and _Improving_ PyTorch Models via Interventions

**pyvene** is an open-source Python library for intervening on the internal states of
PyTorch models. Interventions are an important operation in many areas of AI, including
model editing, steering, robustness, and interpretability.

pyvene has many features that make interventions easy:

- Interventions are the basic primitive, specified as dicts and thus able to be saved locally
and shared as serialisable objects through HuggingFace.
- Interventions can be composed and customised: you can run them on multiple locations, on arbitrary
sets of neurons (or other levels of granularity), in parallel or in sequence, on decoding steps of
generative language models, etc.
- Interventions work out-of-the-box on any PyTorch model! No need to define new model classes from
scratch and easy interventions are possible all kinds of architectures (RNNs, ResNets, CNNs, Mamba).

pyvene is under active development and constantly being improved 🫡

> [!IMPORTANT]
> Read the pyvene docs at [https://stanfordnlp.github.io/pyvene/](https://stanfordnlp.github.io/pyvene/)!

## Installation

To install the latest stable version of pyvene:

```
pip install pyvene
```

Alternatively, to install a bleeding-edge version, you can clone the repo and install:

```
git clone git@github.com:stanfordnlp/pyvene.git
cd pyvene
pip install -e .
```

When you want to update, you can just run `git pull` in the cloned directory.

We suggest importing the library as:

```
import pyvene as pv
```

## Citation
If you use this repository, please consider to cite our library paper:
```bibtex
@inproceedings{wu-etal-2024-pyvene,
title = "pyvene: A Library for Understanding and Improving {P}y{T}orch Models via Interventions",
author = "Wu, Zhengxuan and Geiger, Atticus and Arora, Aryaman and Huang, Jing and Wang, Zheng and Goodman, Noah and Manning, Christopher and Potts, Christopher",
editor = "Chang, Kai-Wei and Lee, Annie and Rajani, Nazneen",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-demo.16",
pages = "158--165",
}
```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=stanfordnlp/pyvene,stanfordnlp/pyreft&type=Date)](https://star-history.com/#stanfordnlp/pyvene&stanfordnlp/pyreft&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stanfordnlp/pyvene

Awesome Lists containing this project

README