{"id":26049418,"url":"https://github.com/locuslab/open-unlearning","last_synced_at":"2025-07-20T15:37:45.937Z","repository":{"id":216542208,"uuid":"741543947","full_name":"locuslab/open-unlearning","owner":"locuslab","description":"The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: benchmarks, methods, evaluations, models etc. are easily extensible.","archived":false,"fork":false,"pushed_at":"2025-06-22T07:24:23.000Z","size":16677,"stargazers_count":295,"open_issues_count":6,"forks_count":65,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-06-22T08:25:18.715Z","etag":null,"topics":["benchmarks","llm-evaluation-metrics","llm-privacy","llm-unlearning","llms","membership-inference","membership-inference-attacks","open-source","privacy-protection","right-to-be-forgotten","unlearning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/locuslab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-10T16:05:57.000Z","updated_at":"2025-06-22T07:24:27.000Z","dependencies_parsed_at":"2024-05-31T00:30:06.571Z","dependency_job_id":"e0b5b513-3ac5-41b9-91d0-12424e551ce5","html_url":"https://github.com/locuslab/open-unlearning","commit_stats":null,"previous_names":["locuslab/tofu","locuslab/open-unlearning"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/locuslab/open-unlearning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fopen-unlearning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fopen-unlearning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fopen-unlearning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fopen-unlearning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/locuslab","download_url":"https://codeload.github.com/locuslab/open-unlearning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/locuslab%2Fopen-unlearning/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266151573,"owners_count":23884443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarks","llm-evaluation-metrics","llm-privacy","llm-unlearning","llms","membership-inference","membership-inference-attacks","open-source","privacy-protection","right-to-be-forgotten","unlearning"],"created_at":"2025-03-08T01:01:30.460Z","updated_at":"2025-07-20T15:37:45.924Z","avatar_url":"https://github.com/locuslab.png","language":"Python","funding_links":[],"categories":["Frameworks","Benchmarks"],"sub_categories":["2021","Type: Graph"],"readme":"\u003cdiv align=\"center\"\u003e\n\n![*Open*Unlearning](assets/banner.png)\n\n\u003ch3\u003e\u003cstrong\u003eAn easily extensible framework unifying LLM unlearning evaluation benchmarks.\u003c/strong\u003e\u003c/h3\u003e\n\n  \u003cdiv style=\"display: flex; gap: 10px; justify-content: center; align-items: center;\"\u003e\n    \u003ca href=\"https://arxiv.org/abs/2506.12618\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-Report-b31b1b?logo=arxiv\u0026logoColor=white\" alt=\"arXiv Paper\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/locuslab/open-unlearning\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/locuslab/open-unlearning?style=social\" alt=\"GitHub Repo stars\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/locuslab/open-unlearning/actions\"\u003e\u003cimg src=\"https://github.com/locuslab/open-unlearning/actions/workflows/tests.yml/badge.svg\" alt=\"Build Status\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://huggingface.co/open-unlearning\"\u003e\u003cimg src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue\" alt=\"HuggingFace 🤗\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/locuslab/open-unlearning\"\u003e\u003cimg src=\"https://img.shields.io/github/repo-size/locuslab/open-unlearning\" alt=\"GitHub repo size\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/locuslab/open-unlearning\"\u003e\u003cimg src=\"https://img.shields.io/github/languages/top/locuslab/open-unlearning\" alt=\"GitHub top language\"/\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/locuslab/open-unlearning/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue\" alt=\"License: MIT\"/\u003e\u003c/a\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\n\n---\n\n## 📖 Overview\n\nWe provide efficient and streamlined implementations of the TOFU, MUSE and WMDP unlearning benchmarks while supporting 11+ unlearning methods, 5+ datasets, 10+ evaluation metrics, and 7+ LLM architectures. Each of these can be easily extended to incorporate more variants.\n\nWe invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.\n\n---\n\n### 📢 Updates\n\n### [June 20, 2025]\n\n🚨 Our paper `OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics` is now out on [arXiv](https://arxiv.org/abs/2506.12618).\n\n🌟 **Highlights:**\n- A detailed technical report on OpenUnlearning covering the design, features, and implementation.\n- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w \u0026 w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).\n- Results benchmarking 8 diverse unlearning methods in one place using 10 evaluation metrics on TOFU.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eOlder Updates\u003c/b\u003e\u003c/summary\u003e\n\n\n#### [May 19, 2025]\n\n- **More Methods!** Added support for unlearning methods [UNDIAL](https://aclanthology.org/2025.naacl-long.444/) and [AltPO](https://aclanthology.org/2025.coling-main.252/).\n\n#### [May 12, 2025]\n\n- **Another benchmark!** We now support running the [`WMDP`](https://wmdp.ai/) benchmark with its `Zephyr` task model.\n- **More evaluations!**  The [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) toolkit has been integrated into OpenUnlearning, enabling WMDP evaluations and support for popular general LLM benchmarks, including MMLU, GSM8K, and others.\n\n#### [Apr 6, 2025]\n- **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and  Exact Memorization (EM) as additional evaluation metrics.\n- **More TOFU Evaluations!** Now includes a holdout set and supports MIA attack-based evaluation. You can now compute MUSE's privleak on TOFU.\n- **More Documentation!** [`docs/links.md`](docs/links.md) contains resources for each of the implemented features and other useful LLM unlearning resources.\n\nBe sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.\n\n#### [Mar 27, 2025]\n- **More Documentation: easy contributions and the leaderboard functionality**: We've updated the documentation to make contributing new unlearning methods and benchmarks much easier. Users can document additions better and also update a leaderboard with their results. See [this section](#-how-to-contribute) for details.\n\n#### [Mar 9, 2025]\n- **More Methods!** Added support for [RMU](https://arxiv.org/abs/2403.03218) (representation-engineering based unlearning).\n\n#### [Feb 27, 2025]  \n⚠️ **Repository Update**: This repo replaces the original TOFU codebase at [`github.com/locuslab/tofu`](https://github.com/locuslab/tofu), which is no longer maintained.\n\n\u003c/details\u003e\n\n\n---\n\n## 🗃️ Available Components\n\nWe provide several variants for each of the components in the unlearning pipeline.\n\n| **Component**          | **Available Options** |\n|------------------------|----------------------|\n| **Benchmarks**        | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |\n| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U |\n| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks, [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |\n| **Datasets**          | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber |\n| **Model Families**    | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |\n\n---\n\n## 📌 Table of Contents\n- 📖 [Overview](#-overview)\n- 📢 [Updates](#-updates)\n- 🗃️ [Available Components](#%EF%B8%8F-available-components)\n- ⚡ [Quickstart](#-quickstart)\n- 🔄 [Updated TOFU benchmark](#-updated-tofu-benchmark)\n- 🧪 [Running Experiments](#-running-experiments)\n  - 🚀 [Perform Unlearning](#-perform-unlearning)\n  - 📊 [Perform an Evaluation](#-perform-an-evaluation)\n  - 📜 [Running Baseline Experiments](#-running-baseline-experiments)\n- ➕ [How to Contribute](#-how-to-contribute)\n- 📚 [Further Documentation](#-further-documentation)\n- 🔗 [Support \u0026 Contributors](#-support--contributors)\n- 📝 [Citing this work](#-citing-this-work)\n- 🤝 [Acknowledgements](#-acknowledgements)\n- 📄 [License](#-license)\n\n---\n\n## ⚡ Quickstart\n\n```bash\n# Environment setup\nconda create -n unlearning python=3.11\nconda activate unlearning\npip install .[lm_eval]\npip install --no-build-isolation flash-attn==2.6.3\n\n# Data setup\npython setup_data.py --eval # saves/eval now contains evaluation results of the uploaded models\n# This downloads log files with evaluation results (including retain model logs)\n# into `saves/eval`, used for evaluating unlearning across supported benchmarks.\n# Additional datasets (e.g., WMDP) are supported — run below for options:\n# python setup_data.py --help\n```\n\n---\n\n### 🔄 Updated TOFU benchmark\n\nWe've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu). \n\nFor each architecture, we have finetuned with four different splits of the TOFU datasets: `full`, `retain90`, `retain95`, `retain99`, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on [HuggingFace](`https://huggingface.co/collections/open-unlearning/tofu-new-models-67bcf636334ea81727573a9f0`) and the paths to these models can be set in the experimental configs or in command-line overrides.\n\n---\n\n## 🧪 Running Experiments\n\nWe provide an easily configurable interface for running evaluations by leveraging Hydra configs. For a more detailed documentation of aspects like running experiments, commonly overriden arguments, interfacing with configurations, distributed training and simple finetuning of models, refer [`docs/experiments.md`](docs/experiments.md).\n\n### 🚀 Perform Unlearning\n\nAn example command for launching an unlearning process with `GradAscent` on the TOFU `forget10` split:\n\n```bash\npython src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default \\\n  forget_split=forget10 retain_split=retain90 trainer=GradAscent task_name=SAMPLE_UNLEARN\n```\n\n- `experiment`- Path to the Hydra config file [`configs/experiment/unlearn/tofu/default.yaml`](configs/experiment/unlearn/tofu/default.yaml) with default experimental settings for TOFU unlearning, e.g. train dataset, eval benchmark details, model paths etc..\n- `forget_split/retain_split`- Sets the forget and retain dataset splits.\n- `trainer`- Load [`configs/trainer/GradAscent.yaml`](configs/trainer/GradAscent.yaml) and override the unlearning method with the handler (see config) implemented in [`src/trainer/unlearn/grad_ascent.py`](src/trainer/unlearn/grad_ascent.py).\n\n### 📊 Perform an Evaluation\n\nAn example command for launching a TOFU evaluation process on `forget10` split:\n\n```bash\nmodel=Llama-3.2-1B-Instruct\npython src/eval.py --config-name=eval.yaml experiment=eval/tofu/default \\\n  model=${model} \\\n  model.model_args.pretrained_model_name_or_path=open-unlearning/tofu_${model}_full \\\n  retain_logs_path=saves/eval/tofu_${model}_retain90/TOFU_EVAL.json \\\n  task_name=SAMPLE_EVAL\n```\n\n- `experiment`- Path to the evaluation configuration [`configs/experiment/eval/tofu/default.yaml`](configs/experiment/eval/tofu/default.yaml).\n- `model`- Sets up the model and tokenizer configs for the `Llama-3.2-1B-Instruct` model.\n- `model.model_args.pretrained_model_name_or_path`- Overrides the default experiment config to evaluate a model from a HuggingFace ID (can use a local model checkpoint path as well).\n- `retain_logs_path`- Sets the path to the reference model eval logs that is needed to compute reference model based metrics like `forget_quality` in TOFU.\n\nFor more details about creating and running evaluations, refer [`docs/evaluation.md`](docs/evaluation.md).\n\n\n### 📜 Running Baseline Experiments\nThe scripts below execute standard baseline unlearning experiments on the TOFU and MUSE datasets, evaluated using their corresponding benchmarks. The expected results for these are in [`docs/repro.md`](docs/repro.md).\n\n```bash\nbash scripts/tofu_unlearn.sh\nbash scripts/muse_unlearn.sh\n```\n\nThe above scripts are not tuned and uses default hyper parameter settings. We encourage you to tune your methods and add your final results in [`community/leaderboard.md`](community/leaderboard.md).\n\n---\n\n## ➕ How to Contribute\n\nIf you are interested in contributing to our work, please have a look at [`contributing.md`](docs/contributing.md) guide.\n\n\n## 📚 Further Documentation\n\nFor more in-depth information on specific aspects of the framework, refer to the following documents:\n\n| **Documentation**                              | **Contains**                                                                                                       |\n|------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|\n| [`docs/contributing.md`](docs/contributing.md)       | Instructions on how to add new methods, benchmarks, components such as trainers, benchmarks, metrics, models, datasets, etc.              |\n| [`docs/evaluation.md`](docs/evaluation.md)       | Detailed instructions on creating and running evaluation metrics and benchmarks.                                     |\n| [`docs/experiments.md`](docs/experiments.md)     | Guide on running experiments in various configurations and settings, including distributed training, fine-tuning, and overriding arguments. |\n| [`docs/hydra.md`](docs/hydra.md)                 | A short tutorial on Hydra features, Hydra is the configuration management package we use extensively.                                  |\n| [`community/leaderboard.md`](community/leaderboard.md)             | Reference results from various unlearning methods run using this framework on TOFU and MUSE benchmarks.              |\n| [`docs/links.md`](docs/links.md)             | List of all links to the research papers or other sources the implemented features are sourced from.              |\n| [`docs/repro.md`](docs/repro.md)            | Results are provided solely for reproducibility purposes, without any parameter tuning.             |\n---\n\n## 🔗 Support \u0026 Contributors\n\nDeveloped and maintained by Vineeth Dorna ([@Dornavineeth](https://github.com/Dornavineeth)) and Anmol Mekala ([@molereddy](https://github.com/molereddy)).\n\nIf you encounter any issues or have questions, feel free to raise an issue in the repository 🛠️.\n\n## 📝 Citing this work\n\nIf you use OpenUnlearning in your research, please make sure to cite our OpenUnlearning technical report, the TOFU and MUSE benchmarks.\n\n```bibtex\n@article{openunlearning2025,\n  title={{OpenUnlearning}: Accelerating {LLM} Unlearning via Unified Benchmarking of Methods and Metrics},\n  author={Dorna, Vineeth and Mekala, Anmol and Zhao, Wenlong and McCallum, Andrew and Lipton, Zachary C and Kolter, J Zico and Maini, Pratyush},\n  journal={arXiv preprint arXiv:2506.12618},\n  year={2025},,\n}\n@inproceedings{maini2024tofu,\n  title={{TOFU}: A Task of Fictitious Unlearning for {LLMs}},\n  author={Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary Chase and Kolter, J Zico},\n  booktitle={First Conference on Language Modeling},\n  year={2024}\n}\n@article{shi2024muse,\n  title={{MUSE}: Machine Unlearning Six-Way Evaluation for Language Models},\n  author={Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang},\n  year={2024},\n  eprint={2407.06460},\n  archivePrefix={arXiv},\n  primaryClass={cs.CL},\n  url={https://arxiv.org/abs/2407.06460},\n}\n```\n\u003c/details\u003e\n\n---\n\n### 🤝 Acknowledgements\n\n- This repo is inspired from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). \n- The [TOFU](https://github.com/locuslab/tofu) and [MUSE](https://github.com/swj0419/muse_bench) benchmarks served as the foundation for our re-implementation. \n\n---\n\n### 📄 License\nThis project is licensed under the MIT License. See the [`LICENSE`](LICENSE) file for details.\n\n---\n\n[![Star History Chart](https://api.star-history.com/svg?repos=locuslab/open-unlearning\u0026type=Date)](https://www.star-history.com/#locuslab/open-unlearning\u0026Date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fopen-unlearning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flocuslab%2Fopen-unlearning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocuslab%2Fopen-unlearning/lists"}