{"id":19228873,"url":"https://github.com/jinzhuoran/rwku","last_synced_at":"2025-04-02T20:08:39.337Z","repository":{"id":244551626,"uuid":"806978404","full_name":"jinzhuoran/RWKU","owner":"jinzhuoran","description":"RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024","archived":false,"fork":false,"pushed_at":"2024-09-30T13:26:32.000Z","size":4008,"stargazers_count":69,"open_issues_count":3,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-24T16:38:10.476Z","etag":null,"topics":["adversarial-attacks","benchmark","evaluation-framework","forgetting","large-language-models","membership-inference-attack","natural-language-processing","privacy-protection","right-to-be-forgotten","unlearning"],"latest_commit_sha":null,"homepage":"https://rwku-bench.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jinzhuoran.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-28T09:02:51.000Z","updated_at":"2025-03-03T07:51:02.000Z","dependencies_parsed_at":"2024-06-18T04:16:29.089Z","dependency_job_id":"d098307d-dea3-4a01-9543-4f5f6c1a52c7","html_url":"https://github.com/jinzhuoran/RWKU","commit_stats":null,"previous_names":["jinzhuoran/rwku"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinzhuoran%2FRWKU","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinzhuoran%2FRWKU/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinzhuoran%2FRWKU/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinzhuoran%2FRWKU/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jinzhuoran","download_url":"https://codeload.github.com/jinzhuoran/RWKU/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246884766,"owners_count":20849554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adversarial-attacks","benchmark","evaluation-framework","forgetting","large-language-models","membership-inference-attack","natural-language-processing","privacy-protection","right-to-be-forgotten","unlearning"],"created_at":"2024-11-09T15:30:42.924Z","updated_at":"2025-04-02T20:08:39.307Z","avatar_url":"https://github.com/jinzhuoran.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# \u003cimg src=\"file/logo.png\" alt=\"RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models\" width=\"5%\"\u003e RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models \n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://rwku-bench.github.io/\"\u003e 🏠 Homepage\u003c/a\u003e |\n  \u003ca href=\"https://arxiv.org/abs/2406.10890\"\u003e 📜 Paper\u003c/a\u003e | \n  \u003ca href=\"https://huggingface.co/datasets/jinzhuoran/RWKU\"\u003e 🤗 Dataset\u003c/a\u003e | \n  \u003ca href=\"#Installation\"\u003e 🚀 Installation\u003c/a\u003e \n\u003c/p\u003e\n\n### News\n- 2024-09-26: 🚀🚀 Our paper has been accepted at NeurIPS D\u0026B Track 2024.\n\n- 2024-06-18: We released our [paper](https://arxiv.org/abs/2406.10890) titled \"RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models\".\n\n- 2024-06-05: We released our [dataset](https://huggingface.co/datasets/jinzhuoran/RWKU) on the Huggingface.\n\n### Description\n\nRWKU is a real-world knowledge unlearning benchmark specifically designed for large language models (LLMs). This benchmark contains 200 real-world unlearning targets and 13,131 multi-level forget probes, including 3,268 fill-in-the-blank probes, 2,879 question-answer probes, and 6,984 adversarial-attack probes. RWKU is designed based on the following three key factors:\n\n- For the **task setting**, we consider a more practical and challenging setting, similar to zero-shot knowledge unlearning. We provide only the unlearning target and the original model, without offering any forget corpus or retain corpus. In this way, it avoids secondary information leakage caused by the forget corpus and is not affected by the distribution bias of the retain corpus.\n- For the **knowledge source**, we choose real-world famous people from Wikipedia as the unlearning targets and demonstrate that such popular knowledge is widely present in various LLMs through memorization quantification, making it more suitable for knowledge unlearning. Additionally, choosing entities as unlearning targets can well clearly define the unlearning boundaries.\n- For the **evaluation framework**, we carefully design the **forget set** and the **retain set** to evaluate the model's capabilities from multiple real-world applications.\n  - Regarding the **forget set**, we evaluate the **efficacy** of knowledge unlearning at both the **knowledge memorization** (fill-in-the-blank style) and **knowledge manipulation** (question-answer style) abilities. Specifically, we also evaluate these two abilities through **adversarial attacks** to induce forgotten knowledge in the model. We adopt four **membership inference attack** (MIA) methods for knowledge memorization on our collected MIA set. We meticulously designed nine types of adversarial-attack probes for knowledge manipulation, including _prefix injection, affirmative suffix, role playing, reverse query, and others_.\n  - Regarding the **retain set**, we design a neighbor set to test the impact of _neighbor perturbation_, specifically focusing on the **locality** of unlearning. In addition, we assess the **model utility** on various downstream capabilities, including _general ability, reasoning ability, truthfulness, factuality, and fluency_.\n\n\n\n### Installation\n\n```bash\ngit clone https://github.com/jinzhuoran/RWKU.git\nconda create -n rwku python=3.10\nconda activate rwku\ncd RWKU\npip install -r requirements.txt\n```\n\n### Dataset Download and Processing\n\n\nOne way is to load the dataset from [Huggingface](https://huggingface.co/datasets/jinzhuoran/RWKU) and preprocess it.\n```bash\ncd process\npython data_process.py\n```\n\n\nAnother way is to download the processed dataset directly from [Google Drive](https://drive.google.com/file/d/1ukWg-T3GPvqpyW7058vNyRWdXuQHRJPb/view?usp=sharing).\n```bash\ncd LLaMA-Factory/data\nbash download.sh\n```\n\n### Unlearning Target\n\nRWKU includes 200 famous people from [The Most Famous All-time People Rank](https://today.yougov.com/ratings/international/fame/all-time-people/all), such as Stephen King, Warren Buffett, Taylor Swift, etc.\nWe demonstrate that such popular knowledge is widely present in various LLMs through memorization quantification, making it more suitable for unlearning.\n```python\nfrom datasets import load_dataset\nforget_target = load_dataset(\"jinzhuoran/RWKU\", 'forget_target')['train'] # 200 unlearning targets\n```\n\n\n### Evaluation Framework\n\nRWKU mainly consists of four subsets, including forget set, neighbor set, MIA set and utility set.\n![Evaluation Framework.](file/framework.png)\n\n#### Forget Set\n\n```python\nfrom datasets import load_dataset\nforget_level1 = load_dataset(\"jinzhuoran/RWKU\", 'forget_level1')['test'] # forget knowledge memorization probes\nforget_level2 = load_dataset(\"jinzhuoran/RWKU\", 'forget_level2')['test'] # forget knowledge manipulation probes\nforget_level3 = load_dataset(\"jinzhuoran/RWKU\", 'forget_level3')['test'] # forget adversarial attack probes\n```\n\n#### Neighbor Set\n\n```python\nfrom datasets import load_dataset\nneighbor_level1 = load_dataset(\"jinzhuoran/RWKU\", 'neighbor_level1')['test'] # neighbor knowledge memorization probes\nneighbor_level2 = load_dataset(\"jinzhuoran/RWKU\", 'neighbor_level2')['test'] # neighbor knowledge manipulation probes\n```\n\n#### MIA Set\n\n```python\nfrom datasets import load_dataset\nmia_forget = load_dataset(\"jinzhuoran/RWKU\", 'mia_forget') # forget member set\nmia_retain = load_dataset(\"jinzhuoran/RWKU\", 'mia_retain') # retain member set\n```\n\n#### Utility Set\n\n```python\nfrom datasets import load_dataset\nutility_general = load_dataset(\"jinzhuoran/RWKU\", 'utility_general') # general ability\nutility_reason = load_dataset(\"jinzhuoran/RWKU\", 'utility_reason') # reasoning ability\nutility_truthfulness = load_dataset(\"jinzhuoran/RWKU\", 'utility_truthfulness') # truthfulness\nutility_factuality = load_dataset(\"jinzhuoran/RWKU\", 'utility_factuality') # factuality\nutility_fluency = load_dataset(\"jinzhuoran/RWKU\", 'utility_fluency') # fluency\n```\n\n\n\n### Supported Unlearning Methods\n- **In-Context Unlearning (ICU)**: We use specific instructions to make the model behave as if it has forgotten the target knowledge, without actually modifying the model parameters.\n- **Gradient Ascent (GA)**: In contrast to the gradient descent during the pre-training phase, we maximize the negative log-likelihood loss on the forget corpus. This approach aims to steer the model away from its initial predictions, facilitating the process of unlearning. \n- **Direct Preference Optimization (DPO)**: We apply preference optimization to enable the model to generate incorrect target knowledge. DPO requires positive and negative examples to train the model. For the positive example, we sample it from the counterfactual corpus, which consists of intentionally fabricated descriptions generated by the model about the target. For the negative example, we sample it from the synthetic forget corpus. \n- **Negative Preference Optimization (NPO)**: NPO is a simple drop-in fix of the GA loss. Compared to DPO, NPO retains only the negative examples without any positive examples. \n- **Rejection Tuning (RT)**: First, we have the model generate some questions related to the unlearning targets, then replace its responses with “I do not know the answer.”. Then, we use this refusal data to fine-tune the model so that it can reject questions related to the target.\n\n\n### Forget Corpus Generation\nWe have provided the forget corpus for both [Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) to facilitate reproducibility.\n```python\nfrom datasets import load_dataset\ntrain_positive_llama3 = load_dataset(\"jinzhuoran/RWKU\", 'train_positive_llama3')['train'] # For GA and NPO\ntrain_pair_llama3 = load_dataset(\"jinzhuoran/RWKU\", 'train_pair_llama3')['train'] # For DPO\ntrain_refusal_llama3 = load_dataset(\"jinzhuoran/RWKU\", 'train_refusal_llama3')['train'] # For RT\n\ntrain_positive_phi3 = load_dataset(\"jinzhuoran/RWKU\", 'train_positive_phi3')['train'] # For GA and NPO\ntrain_pair_phi3 = load_dataset(\"jinzhuoran/RWKU\", 'train_pair_phi3')['train'] # For DPO\ntrain_refusal_phi3 = load_dataset(\"jinzhuoran/RWKU\", 'train_refusal_phi3')['train'] # For RT\n```\n\nAdditionally, you can construct your own forget corpus to explore new methods and models.\nWe have included our generation script for reference. Please feel free to explore better methods for generating forget corpus.\n```bash\ncd generation\npython pair_generation.py # For GA, DPO and NPO\npython question_generation.py # For RT\n```\n\n\n### Evaluating Models \n\nTo evaluate the model original performance before unlearning.\n```bash\ncd LLaMA-Factory/scripts\nbash run_original.sh\n```\n\n\n### Unlearning Models\n\nWe adapt [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to train the model. \nWe provide several scripts to run various unlearning methods.\n\n#### Single-sample Unlearning Setting\n\n\nTo run the In-Context Unlearning (ICU) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/full/run_icu.sh\n```\nTo run the Gradient Ascent (GA) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/full/run_ga.sh\n```\nTo run the Direct Preference Optimization (DPO) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/full/run_dpo.sh\n```\nTo run the Negative Preference Optimization (NPO) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/full/run_npo.sh\n```\nTo run the Rejection Tuning (RT) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/full/run_rt.sh\n```\n\n#### Batch-sample Unlearning Setting\n\n\nTo run the In-Context Unlearning (ICU) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/batch/run_icu.sh\n```\nTo run the Gradient Ascent (GA) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/batch/run_ga.sh\n```\nTo run the Direct Preference Optimization (DPO) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/batch/run_dpo.sh\n```\nTo run the Negative Preference Optimization (NPO) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/batch/run_npo.sh\n```\nTo run the Rejection Tuning (RT) method on Llama-3-8B-Instruct.\n```bash\ncd LLaMA-Factory\nbash scripts/batch/run_rt.sh\n```\n\n\n#### LoRA Unlearning Setting\n\nPlease set `--finetuning_type lora` and `--lora_target q_proj,v_proj`.\n\n#### Partial-layer Unlearning Setting\n\nPlease set `--train_layers 0-4`.\n\n\n\n### Experimental Results\nResults of main experiment on LLaMA3-Instruct (8B).\n![Results of main experiment on LLaMA3-Instruct (8B).](file/result_llama.jpg)\n\nResults of main experiment on Phi-3 Mini-4K-Instruct (3.8B).\n\n![Results of main experiment on Phi-3 Mini-4K-Instruct (3.8B).](file/result_phi.jpg)\n\n\n\n### Citation\n\nIf you find our codebase and dataset beneficial, please cite our work:\n\n```bibtex\n@misc{jin2024rwku,\n    title={RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models},\n    author={Zhuoran Jin and Pengfei Cao and Chenhao Wang and Zhitao He and Hongbang Yuan and Jiachun Li and Yubo Chen and Kang Liu and Jun Zhao},\n    year={2024},\n    eprint={2406.10890},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n```\n\n### Other Related Projects\n\n- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)\n- [MIMIR](https://github.com/iamgroot42/mimir)\n- [TOFU](https://github.com/locuslab/tofu)\n- [repeng](https://github.com/vgel/repeng)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinzhuoran%2Frwku","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjinzhuoran%2Frwku","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinzhuoran%2Frwku/lists"}