{"id":28676525,"url":"https://github.com/zjunlp/knowundo","last_synced_at":"2026-03-06T02:41:00.219Z","repository":{"id":249455673,"uuid":"816652282","full_name":"zjunlp/KnowUnDo","owner":"zjunlp","description":"[EMNLP 2024] To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models","archived":false,"fork":false,"pushed_at":"2025-01-23T01:51:21.000Z","size":1727,"stargazers_count":42,"open_issues_count":1,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-13T23:04:59.928Z","etag":null,"topics":["artificial-intelligence","benchmark","dataset","knowledge-editing","knowledge-unlearning","knowundo","large-language-models","localization","memflex","model-editing","natural-language-processing","unlearning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-18T06:46:45.000Z","updated_at":"2025-05-14T08:45:54.000Z","dependencies_parsed_at":"2025-01-23T02:30:19.022Z","dependency_job_id":"2555d220-a690-4b81-ba6c-dfe85c0e4706","html_url":"https://github.com/zjunlp/KnowUnDo","commit_stats":null,"previous_names":["zjunlp/knowundo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/KnowUnDo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FKnowUnDo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FKnowUnDo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FKnowUnDo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FKnowUnDo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/KnowUnDo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FKnowUnDo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30159822,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T22:39:40.138Z","status":"online","status_checked_at":"2026-03-06T02:00:08.268Z","response_time":250,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","benchmark","dataset","knowledge-editing","knowledge-unlearning","knowundo","large-language-models","localization","memflex","model-editing","natural-language-processing","unlearning"],"created_at":"2025-06-13T23:04:59.427Z","updated_at":"2026-03-06T02:41:00.205Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\u003cimg src=\"figs/KnowUnDo.png\" width=\"30\" height=\"30\"\u003e KnowUnDo \u003c/h1\u003e\n\u003ch3 align=\"center\"\u003e To Forget or Not? Towards Practical Knowledge Unlearning for LLMs \u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n  📃 \u003ca href=\"https://arxiv.org/abs/2407.01920\" target=\"_blank\"\u003earXiv\u003c/a\u003e • 🤗 \u003ca href=\"https://huggingface.co/datasets/zjunlp/KnowUnDo\" target=\"_blank\"\u003eDataset\u003c/a\u003e \u003cbr\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![Awesome](https://awesome.re/badge.svg)](https://github.com/zjunlp/KnowUnDo) \n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n![](https://img.shields.io/github/last-commit/zjunlp/KnowUnDo?color=green) \n![](https://img.shields.io/badge/PRs-Welcome-red)\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#-overview\"\u003e🔔 Overview\u003c/a\u003e •\n  \u003ca href=\"#-load-datasets\"\u003e📊 Load Datasets\u003c/a\u003e •\n  \u003ca href=\"#-how-to-run\"\u003e🚀 How to Run\u003c/a\u003e •\n  \u003ca href=\"#-citation\"\u003e📖 Citation\u003c/a\u003e •\n\u003c/p\u003e\n\n\u003c/div\u003e\n\n## 🔔 Overview\n\n\u003cdiv align=center\u003e\u003cimg src=\"figs/main.png\" width=\"100%\" height=\"100%\" /\u003e\u003c/div\u003e\n\nWe provide the **KnowUnDo** (EMNLP 2025 Findings), a benchmark containing copyrighted content and user privacy domains to evaluate if the unlearning process inadvertently erases essential knowledge. Access our **KnowUnDo** directly on [Hugging Face](https://huggingface.co/datasets/zjunlp/KnowUnDo).\n\nTo address this, we propose a simple yet effective method, **MemFlex**, which utilizes gradient information to precisely target and unlearn sensitive parameters.\n\n\n## 📊 Load Datasets\nYou can easily load the datasets following below.\n\n```python\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"zjunlp/KnowUnDo\", name='copyright', split='unlearn')\n```\n* Available configuration names and corresponding splits:\n  - `copyright`: `unlearn`, `retention`;\n  - `privacy`: `unlearn`, `retention`;\n\n## 🚀 How to run\n### Environment Setup\n```bash\ngit clone https://github.com/zjunlp/KnowUnDo.git\ncd KnowUnDo\nconda create -n KnowUnDo python==3.10\n\nconda activate KnowUnDo\npip install -e .\npip install -r requirements.txt\n\ncd llm_unlearn/apex\npip install -v --no-cache-dir ./\n```\n### Download Large Language Models (LLMs)\n```bash\n# directory: KnowUnDo\nmkdir models\ncd models\ngit lfs install\ngit clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf\ngit clone https://huggingface.co/Qwen/Qwen1.5-7B-Chat\n```\n### Pretrain LLMs in Our Setting\n```bash\n# directory: pretrain\nbash run_finetune_lora.sh\n```\n### Knowledge Localization (Optional)\nWe have released the localized knowledge region. You can perform the localization yourself as follows.\n```bash\n# directory: pretrain\nbash run_localization.sh\n```\n### Prepare tokenized datasets\n```bash\n# directory: llm_unlearn\ncd utils\nbash tokenize_datasets.sh\n```\n+ `--val` for the `val` split of the dataset.\n+ `--prompt` for concating `direct_prompt` before the `question` in the datasets.\n\n### Unlearning experiments\n```bash\n# directory: llm_unlearn\nbash run_baselines_lora.sh\nbash run_ours_lora.sh\n```\n- Available methods with corresponding arguments: \n  - `--unlearn_method gradient_ascent `\n  - `--unlearn_method random_label --completely_random True` (named Fine-tuning with Random Labels in the paper)\n  - `--unlearn_method random_label  --top_k 1  --rm_groundtruth True` (named Unlearning with Adversarial Samples in the paper)\n  - `--unlearn_method ascent_plus_descent`\n  - `--unlearn_method ascent_plus_kl_divergence`\n  - `--unlearn_method ascent_plus_descent --general True`\n  - `--unlearn_method ascent_plus_kl_divergence --general True`\n  - `--unlearn_method memflex` (the strong baseline proposed by us)\n### Eval Unlearned Model\n\u003c!-- ```bash\n# directory: llm_unlearn\ntorchrun --nproc_per_node=1 --master_port=20001 run_eval_lora.py \\\n    --model_name_or_path /path/to/your/unlearned/model \\\n    --tokenizer_name ../models/Llama-2-7b-chat-hf \\\n    --per_device_eval_batch_size 1 \\\n    --do_eval \\\n    --output_dir ./output/copyright/Llama-2-7b-chat-hf-eval \\\n    --overwrite_output_dir \\\n    --overwrite_cache \\\n    --tf32 True \\\n    --domain copyright\n``` --\u003e\nYou can evaluate multiple unlearned models together by running our script **only once**.\n```bash\n# directory: llm_unlearn\nbash run_eval_baselines_lora.sh\n```\n+ `--direct_prompt=True` means concating `direct_prompt` before the `question` in the datasets.\n## 🎉 Acknowledgement\n\nWe would like to express our sincere gratitude to the excellent work [Unlearning LLM](https://github.com/yaojin17/Unlearning_LLM), [TOFU](https://github.com/locuslab/tofu), [LLaMA](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), and [Qwen](https://github.com/QwenLM/Qwen2?tab=readme-ov-file).\n\n## 📖 Citation\n\nIf you use or extend our work, please cite the paper as follows:\n\n```bibtex\n@article{tian2024forget,\n  title={To forget or not? towards practical knowledge unlearning for large language models},\n  author={Tian, Bozhong and Liang, Xiaozhuan and Cheng, Siyuan and Liu, Qingbin and Wang, Mengru and Sui, Dianbo and Chen, Xi and Chen, Huajun and Zhang, Ningyu},\n  journal={arXiv preprint arXiv:2407.01920},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fknowundo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fknowundo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fknowundo/lists"}