https://github.com/shashuat/unlearun
Machine Unlearning in Large Language Models for PyTorch
https://github.com/shashuat/unlearun
machine-learning machine-unlearning
Last synced: 2 months ago
JSON representation
Machine Unlearning in Large Language Models for PyTorch
- Host: GitHub
- URL: https://github.com/shashuat/unlearun
- Owner: shashuat
- License: apache-2.0
- Created: 2025-05-31T11:28:31.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-01-12T17:12:25.000Z (5 months ago)
- Last Synced: 2026-03-27T05:24:33.856Z (3 months ago)
- Topics: machine-learning, machine-unlearning
- Language: Python
- Homepage: https://unlea.run
- Size: 44.9 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Unlearun: Machine Unlearning for Fine-tuned LLMs
A comprehensive Python package for machine unlearning in large language models, enabling efficient removal of unwanted knowledge while preserving model utility.
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/psf/black)
---
## ๐ฏ Overview
**Unlearun** addresses the critical need to remove specific knowledge from trained language models without expensive retraining. This is essential for:
- **Privacy Compliance**: GDPR "right to be forgotten" requirements
- **Copyright Protection**: Removing copyrighted content from models
- **AI Safety**: Eliminating harmful or dangerous knowledge
- **Model Correction**: Fixing outdated or incorrect information
## โจ Key Features
- **5 State-of-the-Art Methods**: GradAscent, GradDiff, DPO, RMU, SimNPO
- **Simple High-Level API**: Get started with just a few lines of code
- **Comprehensive Evaluation**: Built-in metrics for forget quality, utility preservation, and privacy
- **Flexible Data Loading**: Support for JSON, JSONL, HuggingFace datasets, and Python lists
- **Production Ready**: Extensive test coverage and benchmarking
- **HuggingFace Integration**: Seamless integration with `transformers` and `accelerate`
## ๐ Quick Start
### Installation
```bash
pip install unlearun
```
Or install from source:
```bash
git clone https://github.com/shashuat/unlearun.git
cd unlearun
pip install -e .
```
### Basic Usage
```python
from unlearun import Unlearning
# Initialize unlearner with RMU method
unlearner = Unlearning(
method="rmu",
model="gpt2-medium",
output_dir="./unlearned_model"
)
# Load your data
forget_data = [
{"question": "What is the capital of France?", "answer": "Paris"},
{"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"}
]
retain_data = [
{"question": "What is the capital of Germany?", "answer": "Berlin"},
{"question": "Who painted the Mona Lisa?", "answer": "Leonardo da Vinci"}
]
unlearner.load_data(
forget_data=forget_data,
retain_data=retain_data,
max_length=128
)
# Run unlearning
unlearner.run(
batch_size=4,
learning_rate=5e-5,
num_epochs=3
)
# Evaluate results
results = unlearner.evaluate(
metrics=["perplexity", "forget_quality", "model_utility"]
)
print(f"Forget Quality: {results['forget_quality']:.4f}")
print(f"Model Utility: {results['model_utility']:.4f}")
```
## ๐ Supported Methods
| Method | Description | Reference Model | Best For |
|--------|-------------|----------------|----------|
| **RMU** | Representation Misdirection for Unlearning | Required | Safety-critical applications, robust forgetting |
| **GradDiff** | Gradient Difference (ascent on forget, descent on retain) | Optional | Balanced forget/retain trade-off |
| **DPO** | Direct Preference Optimization | Required | Preference-based unlearning with alternate answers |
| **SimNPO** | Simple Negative Preference Optimization | Not required | Stable unlearning without reference model |
| **GradAscent** | Gradient Ascent on forget set | Not required | Simple baseline, quick experiments |
### Method Selection Guide
```python
# For safety-critical unlearning (e.g., removing hazardous knowledge)
unlearner = Unlearning(method="rmu", model="model_name", adaptive=True)
# For balanced forgetting with good retain data
unlearner = Unlearning(method="grad_diff", model="model_name",
gamma=1.0, alpha=1.0)
# When you have alternate acceptable answers
unlearner = Unlearning(method="dpo", model="model_name", beta=1.0)
# For simple, stable unlearning
unlearner = Unlearning(method="simnpo", model="model_name")
# Quick baseline for experiments
unlearner = Unlearning(method="grad_ascent", model="model_name")
```
## ๐ Detailed Examples
### Example 1: RMU with Adaptive Steering
```python
from unlearun import Unlearning
# RMU is the most robust method for safety-critical unlearning
unlearner = Unlearning(
method="rmu",
model="gpt2",
output_dir="./rmu_model",
# RMU-specific parameters
steering_coeff=1.0, # Steering strength
target_layer=8, # Which transformer layer to steer
adaptive=True # Use adaptive coefficient (recommended)
)
# Load data from JSON files
unlearner.load_data(
forget_data="forget_set.json",
retain_data="retain_set.json",
max_length=128
)
# Configure training
unlearner.run(
batch_size=4,
learning_rate=1e-5,
num_epochs=3,
gradient_accumulation_steps=2,
warmup_steps=100,
logging_steps=10
)
# Comprehensive evaluation
results = unlearner.evaluate(
metrics=[
"perplexity",
"forget_quality",
"model_utility",
"rouge",
"verbatim_memorization",
"mia"
]
)
```
### Example 2: Gradient Difference with KL Regularization
```python
from unlearun import Unlearning
# GradDiff with KL divergence for smoother retain preservation
unlearner = Unlearning(
method="grad_diff",
model="gpt2-medium",
output_dir="./graddiff_model",
# GradDiff-specific parameters
gamma=1.0, # Weight for forget loss
alpha=1.0, # Weight for retain loss
retain_loss_type="KL" # Use KL divergence (requires ref model)
)
unlearner.load_data(
forget_data="forget.json",
retain_data="retain.json"
)
unlearner.run(
batch_size=2,
learning_rate=5e-5,
num_epochs=5
)
```
### Example 3: Loading from HuggingFace Dataset
```python
from datasets import load_dataset
from unlearun import Unlearning
# Load from HuggingFace Hub
forget_dataset = load_dataset("your_username/forget_dataset", split="train")
retain_dataset = load_dataset("your_username/retain_dataset", split="train")
unlearner = Unlearning(
method="simnpo",
model="meta-llama/Llama-2-7b-hf",
output_dir="./unlearned_llama"
)
unlearner.load_data(
forget_data=forget_dataset,
retain_data=retain_dataset,
question_key="prompt", # Specify your column names
answer_key="completion",
max_length=512
)
unlearner.run(batch_size=1, num_epochs=3)
```
### Example 4: Custom Evaluation
```python
from unlearun import Unlearning
from unlearun.evaluation import (
compute_perplexity,
compute_verbatim_memorization,
compute_mia
)
# After training
unlearner = Unlearning(method="rmu", model="gpt2", output_dir="./model")
unlearner.load_data(forget_data="forget.json", retain_data="retain.json")
unlearner.run(batch_size=2, num_epochs=3)
# Custom evaluation with specific parameters
forget_ppl = compute_perplexity(
model=unlearner.model,
dataset=unlearner.forget_dataset,
tokenizer=unlearner.tokenizer,
batch_size=4
)
# Check for verbatim memorization
verbatim_score = compute_verbatim_memorization(
model=unlearner.model,
forget_dataset=unlearner.forget_dataset,
tokenizer=unlearner.tokenizer,
prefix_length=50,
max_new_tokens=100,
num_samples=100
)
# Membership inference attack
mia_score = compute_mia(
model=unlearner.model,
forget_dataset=unlearner.forget_dataset,
retain_dataset=unlearner.retain_dataset,
tokenizer=unlearner.tokenizer,
batch_size=4
)
print(f"Forget Perplexity: {forget_ppl:.2f}")
print(f"Verbatim Memorization: {verbatim_score:.4f}")
print(f"MIA AUROC: {mia_score:.4f}")
```
## ๐ Evaluation Metrics
The package includes comprehensive evaluation metrics:
### Forget Quality Metrics
- **Perplexity**: Measures how "forgotten" the data is (higher = better)
- **Verbatim Memorization**: ROUGE score between generated and ground truth
- **Knowledge Retention**: QA accuracy on forget topics
### Utility Preservation Metrics
- **Model Utility**: Performance on retain set
- **General Knowledge**: Evaluation on holdout data
- **Task Performance**: Accuracy on downstream tasks
### Privacy Metrics
- **Membership Inference Attack (MIA)**: Resistance to privacy attacks
- **Extraction Attack**: Difficulty of extracting forgotten data
## ๐๏ธ Project Structure
```
unlearun/
โโโ unlearun/
โ โโโ __init__.py # Package entry point
โ โโโ core.py # High-level Unlearning class
โ โโโ methods/ # Unlearning methods
โ โ โโโ grad_ascent.py
โ โ โโโ grad_diff.py
โ โ โโโ dpo.py
โ โ โโโ rmu.py
โ โ โโโ simnpo.py
โ โโโ data/ # Data handling
โ โ โโโ dataset.py
โ โ โโโ collators.py
โ โโโ trainer/ # Custom trainer
โ โ โโโ trainer.py
โ โโโ utils/ # Utilities
โ โ โโโ losses.py
โ โ โโโ helpers.py
โ โโโ evaluation/ # Evaluation metrics
โ โโโ metrics.py
โโโ tests/ # Test suite
โ โโโ test_unlearning.py
โโโ pyproject.toml # Package configuration
โโโ requirements.txt # Dependencies
โโโ README.md # This file
```
## ๐งช Testing
Run the test suite:
```bash
# Install dev dependencies
pip install -e ".[dev]"
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=unlearun --cov-report=html
# Skip slow tests
pytest tests/ -v -m "not slow"
```
## ๐ Requirements
- Python โฅ 3.8
- PyTorch โฅ 2.0.0
- Transformers โฅ 4.30.0
- Datasets โฅ 2.12.0
- Accelerate โฅ 0.20.0
See `requirements.txt` for full dependency list.
## ๐ Benchmarks
The package is compatible with standard unlearning benchmarks:
- **TOFU** (Task of Fictitious Unlearning for LLMs)
- **WMDP** (Weapons of Mass Destruction Proxy)
- **MUSE** (Machine Unlearning Six-Way Evaluation)
```python
# Example: Evaluate on TOFU benchmark
from datasets import load_dataset
tofu_forget = load_dataset("locuslab/TOFU", "forget01", split="train")
tofu_retain = load_dataset("locuslab/TOFU", "retain99", split="train")
unlearner = Unlearning(method="rmu", model="phi-1.5")
unlearner.load_data(forget_data=tofu_forget, retain_data=tofu_retain)
unlearner.run(batch_size=2, num_epochs=3)
results = unlearner.evaluate()
```
## ๐ค Contributing
We welcome contributions! Please follow these steps:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Add tests for new functionality
5. Ensure all tests pass (`pytest tests/`)
6. Format code (`black unlearun/ tests/`)
7. Commit changes (`git commit -m 'Add amazing feature'`)
8. Push to branch (`git push origin feature/amazing-feature`)
9. Open a Pull Request
### Development Setup
```bash
git clone https://github.com/shashuat/unlearun.git
cd unlearun
pip install -e ".[dev]"
pre-commit install # Optional: for automatic formatting
```
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Citation
If you use Unlearun in your research, please cite:
```bibtex
@software{unlearun2025,
title = {Unlearun: Machine Unlearning for Fine-tuned LLMs},
author = {Your Name},
year = {2025},
url = {https://github.com/shashuat/unlearun},
version = {0.1.0}
}
```
### Key References
This package implements methods from:
```bibtex
@inproceedings{li2024wmdp,
title={The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning},
author={Li, Nathaniel and Pan, Alexander and others},
booktitle={ICML},
year={2024}
}
@inproceedings{rafailov2023dpo,
title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
author={Rafailov, Rafael and Sharma, Archit and others},
booktitle={NeurIPS},
year={2023}
}
@inproceedings{maini2024tofu,
title={TOFU: A Task of Fictitious Unlearning for LLMs},
author={Maini, Pratyush and Feng, Zhili and others},
booktitle={COLM},
year={2024}
}
```
## ๐ Acknowledgments
- Built on [HuggingFace Transformers](https://github.com/huggingface/transformers)
- Inspired by research from CMU, Stanford, and other leading institutions
- Thanks to the machine unlearning research community
## ๐ Support
- **Issues**: [GitHub Issues](https://github.com/shashuat/unlearun/issues)
- **Discussions**: [GitHub Discussions](https://github.com/shashuat/unlearun/discussions)
- **Email**: your.email@example.com
## ๐ Links
- **Documentation**: [Full Documentation](https://unlearun.readthedocs.io)
- **PyPI**: [Package on PyPI](https://pypi.org/project/unlearun/)
- **Paper**: [arXiv](https://arxiv.org/abs/xxxx.xxxxx) (coming soon)
- **WMDP Benchmark**: https://www.wmdp.ai/
- **TOFU Benchmark**: https://github.com/locuslab/tofu
---
**Status**: Active Development | **Version**: 0.1.0 | **Last Updated**: October 2025
Made with โค๏ธ for AI Safety and Privacy