https://github.com/chrisliu298/awesome-llm-unlearning

A resource repository for machine unlearning in large language models
https://github.com/chrisliu298/awesome-llm-unlearning
List: awesome-llm-unlearning
alignment awesome large-language-model llm llm-unlearning machine-unlearning unlearning
Last synced: about 1 month ago
JSON representation
A resource repository for machine unlearning in large language models
Host: GitHub
URL: https://github.com/chrisliu298/awesome-llm-unlearning
Owner: chrisliu298
License: apache-2.0
Created: 2024-03-27T15:44:23.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-03-28T04:56:30.000Z (3 months ago)
Last Synced: 2025-05-03T23:01:43.322Z (about 2 months ago)
Topics: alignment, awesome, large-language-model, llm, llm-unlearning, machine-unlearning, unlearning
Homepage:
Size: 85 KB
Stars: 384
Watchers: 9
Forks: 19
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-llm-unlearning - A resource repository for machine unlearning in large language models. (Other Lists / Julia Lists)
README

        # Awesome Large Language Model Unlearning



 

 

 

 

 



This repository tracks the latest research on machine unlearning in large language models (LLMs). The goal is to offer a comprehensive list of papers and resources relevant to the topic.

As of the last commit, there are **184** papers, **13** surveys and position papers, **1** framework, and **2** blog posts.

> [!NOTE]

> If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue or submit a pull request, and I will be happy to update the list.

## Table of Contents

- [Table of Contents](#table-of-contents)

- [Papers](#papers)

  - [2025](#2025)

  - [2024](#2024)

  - [2023](#2023)

  - [2022](#2022)

  - [2021](#2021)

- [Surveys and Position Papers](#surveys-and-position-papers)

  - [2025](#2025-1)

  - [2024](#2024-1)

  - [2023](#2023-1)

- [Frameworks](#frameworks)

- [Blog Posts](#blog-posts)

  

## Papers

### 2025

- [Effective Skill Unlearning through Intervention and Abstention](https://arxiv.org/abs/2503.21730)

  - Author(s): Yongce Li, Chung-En Sun, Tsui-Wei Weng

  - Date: 2025-03

  - Venue: -

  - Code: -

- [ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging](https://arxiv.org/abs/2503.21088)

  - Author(s): Haoming Xu, Shuxun Wang, Yanqiu Zhao, Yi Zhong, Ziyan Jiang, Ningyuan Zhao, Shumin Deng, Huajun Chen, Ningyu Zhang

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU](https://arxiv.org/abs/2503.15166)

  - Author(s): Àlex Pujol Vidal, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Deep Contrastive Unlearning for Language Models](https://arxiv.org/abs/2503.14900)

  - Author(s): Estrid He, Tabinda Sarwar, Ibrahim Khalil, Xun Yi, Ke Wang

  - Date: 2025-03

  - Venue: -

  - Code: -

- [SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders](https://arxiv.org/abs/2503.14530)

  - Author(s): Qing Li, Jiahui Geng, Derui Zhu, Fengyu Cai, Chenyang Lyu, Fakhri Karray

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Atyaephyra at SemEval-2025 Task 4: Low-Rank NPO](https://arxiv.org/abs/2503.13690)

  - Author(s): Jan Bronec, Jindřich Helcl

  - Date: 2025-03

  - Venue: -

  - Code: -

- [PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models](https://arxiv.org/abs/2503.12545)

  - Author(s): Zhaopan Xu, Pengfei Zhou, Weidong Tang, Jiaxin Ai, Wangbo Zhao, Xiaojiang Peng, Kai Wang, Yang You, Wenqi Shao, Hongxun Yao, Kaipeng Zhang

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Hyperbolic Safety-Aware Vision-Language Models](https://arxiv.org/abs/2503.12127)

  - Author(s): Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi, Rita Cucchiara

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning](https://arxiv.org/abs/2503.11832)

  - Author(s): Yiwei Chen, Yuguang Yao, Yihua Zhang, Bingquan Shen, Gaowen Liu, Sijia Liu

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Don't Forget It! Conditional Sparse Autoencoder Clamping Works for Unlearning](https://arxiv.org/abs/2503.11127)

  - Author(s): Matthew Khoriaty, Andrii Shportko, Gustavo Mercier, Zach Wood-Doughty

  - Date: 2025-03

  - Venue: -

  - Code: -

- [SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability](https://arxiv.org/abs/2503.09532)

  - Author(s): Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum McDougall, Kola Ayonrinde, Matthew Wearden, Arthur Conmy, Samuel Marks, Neel Nanda

  - Date: 2025-03

  - Venue: -

  - Code: -

- [GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models](https://arxiv.org/abs/2503.09117)

  - Author(s): Yue Wang, Qizhou Wang, Feng Liu, Wei Huang, Yali Du, Xiaojiang Du, Bo Han

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Cyber for AI at SemEval-2025 Task 4: Forgotten but Not Lost: The Balancing Act of Selective Unlearning in Large Language Models](https://arxiv.org/abs/2503.04795)

  - Author(s): Dinesh Srivasthav P, Bala Mallikarjunarao Garlapati

  - Date: 2025-03

  - Venue: -

  - Code: -

- [UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets](https://arxiv.org/abs/2503.04693)

  - Author(s): Wenyu Wang, Mengqi Zhang, Xiaotian Ye, Zhaochun Ren, Zhumin Chen, Pengjie Ren

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Improving LLM Safety Alignment with Dual-Objective Optimization](https://arxiv.org/abs/2503.03710)

  - Author(s): Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song

  - Date: 2025-03

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/wicai24/DOOR-Alignment)

- [AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking](https://arxiv.org/abs/2503.02443)

  - Author(s): Iraklis Premptis, Maria Lymperaiou, Giorgos Filandrianos, Orfeas Menis Mastromichalakis, Athanasios Voulodimos, Giorgos Stamou

  - Date: 2025-03

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/iraklis07/llm-unlearning)

- [CE-U: Cross Entropy Unlearning](https://arxiv.org/abs/2503.01224)

  - Author(s): Bo Yang

  - Date: 2025-03

  - Venue: -

  - Code: -

- [Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models](https://arxiv.org/abs/2502.19982)

  - Author(s): Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training](https://arxiv.org/abs/2502.19726)

  - Author(s): Toan Tran, Ruixuan Liu, Li Xiong

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond](https://arxiv.org/abs/2502.19301)

  - Author(s): Qizhou Wang, Jin Peng Zhou, Zhanke Zhou, Saebyeol Shin, Bo Han, Kilian Q. Weinberger

  - Date: 2025-02

  - Venue: -

  - Code: -

- [FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge](https://arxiv.org/abs/2502.19207)

  - Author(s): Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Holistic Audit Dataset Generation for LLM Unlearning via Knowledge Graph Traversal and Redundancy Removal](https://arxiv.org/abs/2502.18810)

  - Author(s): Weipeng Jiang, Juan Zhai, Shiqing Ma, Ziyan Lei, Xiaofei Xie, Yige Wang, Chao Shen

  - Date: 2025-02

  - Venue: -

  - Code: -

- [A General Framework to Enhance Fine-tuning-based LLM Unlearning](https://arxiv.org/abs/2502.17823)

  - Author(s): Jie Ren, Zhenwei Dai, Xianfeng Tang, Hui Liu, Jingying Zeng, Zhen Li, Rahul Goutam, Suhang Wang, Yue Xing, Qi He, Hui Liu

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/renjie3/GRUN)

- [Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models](https://arxiv.org/abs/2502.15910)

  - Author(s): Zheyuan Liu, Guangyao Dou, Xiangchi Yuan, Chunhui Zhang, Zhaoxuan Tan, Meng Jiang

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/franciscoliu/MANU)

- [Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models](https://arxiv.org/abs/2502.15836)

  - Author(s): Haokun Chen, Sebastian Szyller, Weilin Xu, Nageen Himayat

  - Date: 2025-02

  - Venue: -

  - Code: -

- [CoME: An Unlearning-based Approach to Conflict-free Model Editing](https://arxiv.org/abs/2502.15826)

  - Author(s): Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/ekgus9/COME)

- [LUME: LLM Unlearning with Multitask Evaluations](https://arxiv.org/abs/2502.15097)

  - Author(s): Anil Ramakrishna, Yixin Wan, Xiaomeng Jin, Kai-Wei Chang, Zhiqi Bu, Bhanukiran Vinzamuri, Volkan Cevher, Mingyi Hong, Rahul Gupta

  - Date: 2025-02

  - Venue: -

  - Code: -

- [UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning](https://arxiv.org/abs/2502.15082)

  - Author(s): Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/Vaidehi99/UPCORE)

- [Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models](https://arxiv.org/abs/2502.15010)

  - Author(s): Mark Russinovich, Ahmed Salem

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis](https://arxiv.org/abs/2502.13996)

  - Author(s): Yicheng Lang, Kehan Guo, Yue Huang, Yujun Zhou, Haomin Zhuang, Tianyu Yang, Yao Su, Xiangliang Zhang

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/lyicheng619/UNCD)

- [SAFEERASER: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning](https://arxiv.org/abs/2502.12520)

  - Author(s): Junkai Chen, Zhijie Deng, Kening Zheng, Yibo Yan, Shuliang Liu, PeiJun Wu, Peijie Jiang, Jia Liu, Xuming Hu

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning](https://arxiv.org/abs/2502.11441)

  - Author(s): Hwan Chang, Hwanhee Lee

  - Date: 2025-02

  - Venue: -

  - Code: -

- [ReLearn: Unlearning via Learning for Large Language Models](https://arxiv.org/abs/2502.11190)

  - Author(s): Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/zjunlp/unlearn)

- [MMUNLEARNER: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Language Models](https://arxiv.org/abs/2502.11051)

  - Author(s): Jiahao Huo, Yibo Yan, Xu Zheng, Yuanhuiyi Lyu, Xin Zou, Zhihua Wei, Xuming Hu

  - Date: 2025-02

  - Venue: -

  - Code: -

- [LUNAR: LLM Unlearning via Neural Activation Redirection](https://arxiv.org/abs/2502.07218)

  - Author(s): William F. Shen, Xinchi Qiu, Meghdad Kurmanji, Alex Iacob, Lorenzo Sani, Yihong Chen, Nicola Cancedda, Nicholas D. Lane

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning](https://arxiv.org/abs/2502.05739)

  - Author(s): Ruotong Geng, Mingyang Geng, Shangwen Wang, Haotian Wang, Zhipeng Lin, Dezun Dong

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond](https://arxiv.org/abs/2502.05374)

  - Author(s): Chongyu Fan, Jinghan Jia, Yihua Zhang, Anil Ramakrishna, Mingyi Hong, Sijia Liu

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/OPTML-Group/Unlearn-Smooth)

- [Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities](https://arxiv.org/abs/2502.05209)

  - Author(s): Zora Che, Stephen Casper, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell

  - Date: 2025-02

  - Venue: -

  - Code: -

- [FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model](https://arxiv.org/abs/2502.01472)

  - Author(s): Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, Xiaowei Huang

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Tool Unlearning for Tool-Augmented LLMs](https://arxiv.org/abs/2502.01083)

  - Author(s): Jiali Cheng, Hadi Amiri

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/CLU-UML/MU-Bench)

- [ALU: Agentic LLM Unlearning](https://arxiv.org/abs/2502.00406)

  - Author(s): Debdeep Sanyal, Murari Mandal

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Resolving Editing-Unlearning Conflicts: A Knowledge Codebook Framework for Large Language Model Updating](https://arxiv.org/abs/2502.00158)

  - Author(s): Binchi Zhang, Zhengzhang Chen, Zaiyi Zheng, Jundong Li, Haifeng Chen

  - Date: 2025-02

  - Venue: -

  - Code: -

- [Improving the Robustness of Representation Misdirection for Large Language Model Unlearning](https://arxiv.org/abs/2501.19202)

  - Author(s): Dang Huu-Tien, Hoang Thanh-Tung, Le-Minh Nguyen, Naoya Inoue

  - Date: 2025-01

  - Venue: -

  - Code: -

- [Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models](https://arxiv.org/abs/2501.03272)

  - Author(s): Peihai Jiang, Xixiang Lyu, Yige Li, Jing Ma

  - Date: 2025-01

  - Venue: -

  - Code: -

### 2024

- [Multi-Objective Large Language Model Unlearning](https://arxiv.org/abs/2412.20412)

  - Author(s): Zibin Pan, Shuwen Zhang, Yuesheng Zheng, Chi Li, Yuheng Cheng, Junhua Zhao

  - Date: 2024-12

  - Venue: -

  - Code: -

- [Investigating the Feasibility of Mitigating Potential Copyright Infringement via Large Language Model Unlearning](https://arxiv.org/abs/2412.18621)

  - Author(s): Guangyao Dou

  - Date: 2024-12

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/guangyaodou/SSU_Unlearn)

- [Large Language Model Federated Learning with Blockchain and Unlearning for Cross-Organizational Collaboration](https://arxiv.org/abs/2412.13551)

  - Author(s): Xuhan Zuo, Minghao Wang, Tianqing Zhu, Shui Yu, Wanlei Zhou

  - Date: 2024-12

  - Venue: -

  - Code: -

- [Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning](https://arxiv.org/abs/2412.08559)

  - Author(s): Rongzhe Wei, Mufei Li, Mohsen Ghassemi, Eleonora Kreačić, Yifan Li, Xiang Yue, Bo Li, Vamsi K. Potluru, Pan Li, Eli Chien

  - Date: 2024-12

  - Venue: -

  - Code: -

- [Classifier-free guidance in LLMs Safety](https://arxiv.org/abs/2412.06846)

  - Author(s): Roman Smirnov

  - Date: 2024-12

  - Venue: -

  - Code: -

- [Unified Parameter-Efficient Unlearning for LLMs](https://arxiv.org/abs/2412.00383)

  - Author(s): Chenlu Ding, Jiancan Wu, Yancheng Yuan, Jinda Lu, Kai Zhang, Alex Su, Xiang Wang, Xiangnan He

  - Date: 2024-12

  - Venue: -

  - Code: -

- [UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS](https://arxiv.org/abs/2411.18797)

  - Author(s): Haomin Zhuang, Yihua Zhang, Kehan Guo, Jinghan Jia, Gaowen Liu, Sijia Liu, Xiangliang Zhang

  - Date: 2024-11

  - Venue: -

  - Code: -

- [Towards Robust Evaluation of Unlearning in LLMs via Data Transformations](https://arxiv.org/abs/2411.15477)

  - Author(s): Abhinav Joshi, Shaswati Saha, Divyaksh Shukla, Sriram Vema, Harsh Jhamtani, Manas Gaur, Ashutosh Modi

  - Date: 2024-11

  - Venue: EMNLP 2024 Findings

  - Code: -

- [Provable unlearning in topic modeling and downstream tasks](https://arxiv.org/abs/2411.12600)

  - Author(s): Stanley Wei, Sadhika Malladi, Sanjeev Arora, Amartya Sanyal

  - Date: 2024-11

  - Venue: -

  - Code: -

- [Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods](https://arxiv.org/abs/2411.12103)

  - Author(s): Jai Doshi, Asa Cooper Stickland

  - Date: 2024-11

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/JaiDoshi/Knowledge-Erasure)

- [Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method](https://arxiv.org/abs/2411.04388)

  - Author(s): Teodora Baluta, Pascal Lamblin, Daniel Tarlow, Fabian Pedregosa, Gintare Karolina Dziugaite

  - Date: 2024-11

  - Venue: NeurIPS 2024 Safe Generative AI Workshop

  - Code: -

- [Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset](https://arxiv.org/abs/2411.03554)

  - Author(s): Yingzi Ma, Jiongxiao Wang, Fei Wang, Siyuan Ma, Jiazhao Li, Xiujun Li, Furong Huang, Lichao Sun, Bo Li, Yejin Choi, Muhao Chen, Chaowei Xiao

  - Date: 2024-11

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/SaFoLab-WISC/FIUBench)

- [Extracting Unlearned Information from LLMs with Activation Steering](https://arxiv.org/abs/2411.02631)

  - Author(s): Atakan Seyitoğlu, Aleksei Kuvshinov, Leo Schwinn, Stephan Günnemann

  - Date: 2024-11

  - Venue: -

  - Code: -

- [RESTOR: Knowledge Recovery through Machine Unlearning](https://arxiv.org/abs/2411.00204)

  - Author(s): Keivan Rezaei, Khyathi Chandu, Soheil Feizi, Yejin Choi, Faeze Brahman, Abhilasha Ravichander

  - Date: 2024-11

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/k1rezaei/restor)

- [Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench](https://arxiv.org/abs/2410.22108)

  - Author(s): Zheyuan Liu, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng, Yongle Yuan, Meng Jiang

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/franciscoliu/MLLMU-Bench)

- [Cross-model Control: Improving Multiple Large Language Models in One-time Training](https://arxiv.org/abs/2410.17599)

  - Author(s): Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate](https://arxiv.org/abs/2410.22086)

  - Author(s): Zhiqi Bu, Xiaomeng Jin, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Mingyi Hong

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Learning and Unlearning of Fabricated Knowledge in Language Models](https://arxiv.org/abs/2410.21750)

  - Author(s): Chen Sun, Nolan Andrew Miller, Andrey Zhmoginov, Max Vladymyrov, Mark Sandler

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Applying sparse autoencoders to unlearn knowledge in language models](https://arxiv.org/abs/2410.19278)

  - Author(s): Eoin Farrell, Yeu-Tong Lau, Arthur Conmy

  - Date: 2024-10

  - Venue: -

  - Code: -

- [CLEAR: Character Unlearning in Textual and Visual Modalities](https://arxiv.org/abs/2410.18057)

  - Author(s): Alexey Dontsov, Dmitrii Korzh, Alexey Zhavoronkin, Boris Mikheev, Denis Bobkov, Aibek Alanov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina

  - Date: 2024-10

  - Venue: -

  - Code: -

- [WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models](https://arxiv.org/abs/2410.17509)

  - Author(s): Jinghan Jia, Jiancheng Liu, Yihua Zhang, Parikshit Ram, Nathalie Baracaldo, Sijia Liu

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/OPTML-Group/WAGLE)

- [UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs](https://arxiv.org/abs/2410.17050)

  - Author(s): Yash Sinha, Murari Mandal, Mohan Kankanhalli

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge](https://arxiv.org/abs/2410.16454)

  - Author(s): Zhiwei Zhang, Fali Wang, Xiaomin Li, Zongyu Wu, Xianfeng Tang, Hui Liu, Qi He, Wenpeng Yin, Suhang Wang

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/zzwjames/FailureLLMUnlearning)

- [When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?](https://arxiv.org/abs/2410.15267)

  - Author(s): Shang Wang, Tianqing Zhu, Dayong Ye, Wanlei Zhou

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Evaluating Deep Unlearning in Large Language Models](https://arxiv.org/abs/2410.15153)

  - Author(s): Ruihan Wu, Chhavi Yadav, Russ Salakhutdinov, Kamalika Chaudhuri

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/wrh14/deep_unlearning)

- [Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation](https://arxiv.org/abs/2410.14425)

  - Author(s): Shuai Zhao, Xiaobao Wu, Cong-Duy Nguyen, Meihuizi Jia, Yichao Feng, Luu Anh Tuan

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/shuaizhao95/Unlearning)

- [Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning](https://arxiv.org/abs/2410.13274)

  - Author(s): Minseok Choi, ChaeHun Park, Dohyun Lee, Jaegul Choo

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/brightjade/Munch)

- [Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization](https://arxiv.org/abs/2410.12949)

  - Author(s): Phillip Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, Gintare Karolina Dziugaite

  - Date: 2024-10

  - Venue: -

  - Code: -

- [LLM Unlearning via Loss Adjustment with Only Forget Data](https://arxiv.org/abs/2410.11143)

  - Author(s): Yaxuan Wang, Jiaheng Wei, Chris Yuhao Liu, Jinlong Pang, Quan Liu, Ankit Parag Shah, Yujia Bao, Yang Liu, Wei Wei

  - Date: 2024-10

  - Venue: -

  - Code: -

- [CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept](https://arxiv.org/abs/2410.10866)

  - Author(s): YuXuan Wu, Bonaventure F. P. Dossou, Dianbo Liu

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Do Unlearning Methods Remove Information from Language Model Weights?](https://arxiv.org/abs/2410.08827)

  - Author(s): Aghyad Deeb, Fabien Roger

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/aghyad-deeb/unlearning_evaluation)

- [A Closer Look at Machine Unlearning for Large Language Models](https://arxiv.org/abs/2410.08109)

  - Author(s): Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/sail-sg/closer-look-LLM-unlearning)

- [Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning](https://arxiv.org/abs/2410.07163)

  - Author(s): Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, Sijia Liu

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/OPTML-Group/Unlearn-Simple)

- [Dissecting Fine-Tuning Unlearning in Large Language Models](https://arxiv.org/abs/2410.06606)

  - Author(s): Yihuai Hong, Yuelin Zou, Lijie Hu, Ziqian Zeng, Di Wang, Haiqin Yang

  - Date: 2024-10

  - Venue: EMNLP 2024

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/yihuaihong/Dissecting-FT-Unlearning)

- [NegMerge: Consensual Weight Negation for Strong Machine Unlearning](https://arxiv.org/abs/2410.05583)

  - Author(s): Hyoseo Kim, Dongyoon Han, Junsuk Choe

  - Date: 2024-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/naver-ai/negmerge)

- [A Probabilistic Perspective on Unlearning and Alignment for Large Language Models](https://arxiv.org/abs/2410.03523)

  - Author(s): Yan Scholten, Stephan Günnemann, Leo Schwinn

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Mitigating Memorization In Language Models](https://arxiv.org/abs/2410.02159)

  - Author(s): Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Nathaniel Hudson, Caleb Geniesse, Kyle Chard, Yaoqing Yang, Ian Foster, Michael W. Mahoney

  - Date: 2024-10

  - Venue: -

  - Code: -

- [Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning](https://arxiv.org/abs/2410.00382)

  - Author(s): Shota Takashiro, Takeshi Kojima, Andrew Gambardella, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo

  - Date: 2024-10

  - Venue: -

  - Code: -

- [An Adversarial Perspective on Machine Unlearning for AI Safety](https://arxiv.org/abs/2409.18025)

  - Author(s): Jakub Łucki, Boyi Wei, Yangsibo Huang, Peter Henderson, Florian Tramèr, Javier Rando

  - Date: 2024-09

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/ethz-spylab/unlearning-vs-safety)

- [Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models](https://aclanthology.org/2025.coling-main.252/)

  - Author(s): Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid Hasan, Elita Lobo

  - Date: 2024-09

  - Venue: COLING 2025

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/molereddy/Alternate-Preference-Optimization/tree/main)

- [LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models](https://arxiv.org/abs/2409.13054)

  - Author(s): Akshaj Kumar Veldanda, Shi-Xiong Zhang, Anirban Das, Supriyo Chakraborty, Stephen Rawls, Sambit Sahu, Milind Naphade

  - Date: 2024-09

  - Venue: -

  - Code: -

- [MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts](https://arxiv.org/abs/2409.11844)

  - Author(s): Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang

  - Date: 2024-09

  - Venue: -

  - Code: - [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/Carol-gutianle/MEOW)

- [Unforgettable Generalization in Language Models](https://arxiv.org/abs/2409.02228)

  - Author(s): Eric Zhang, Leshem Chosen, Jacob Andreas

  - Date: 2024-09

  - Venue: COLM 2024

  - Code: -

- [Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage](https://arxiv.org/abs/2408.17354)

  - Author(s): Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino, Shagufta Mehnaz, Ye Wang

  - Date: 2024-08

  - Venue: -

  - Code: -

- [LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet](https://arxiv.org/abs/2408.15221)

  - Author(s): Nathaniel Li, Ziwen Han, Ian Steneker, Willow Primack, Riley Goodside, Hugh Zhang, Zifan Wang, Cristina Menghini, Summer Yue

  - Date: 2024-08

  - Venue: -

  - Code: -

- [Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code](https://arxiv.org/abs/2408.12416)

  - Author(s): Mahdi Kazemi, Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour, Sen Lin

  - Date: 2024-08

  - Venue: -

  - Code: -

- [Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models](https://arxiv.org/abs/2408.10682)

  - Author(s): Hongbang Yuan, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

  - Date: 2024-08

  - Venue: -

  - Code: -

- [A Population-to-individual Tuning Framework for Adapting Pretrained LM to On-device User Intent Prediction](https://arxiv.org/abs/2408.09815)

  - Author(s): Jiahui Gong, Jingtao Ding, Fanjin Meng, Guilong Chen, Hong Chen, Shen Zhao, Haisheng Lu, Yong Li

  - Date: 2024-08

  - Venue: -

  - Code: -

- [WPN: An Unlearning Method Based on N-pair Contrastive Learning in Language Models](https://arxiv.org/abs/2408.09459)

  - Author(s): Guitao Chen, Yunshen Wang, Hongye Sun, Guang Chen

  - Date: 2024-08

  - Venue: -

  - Code: -

- [Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models](https://arxiv.org/abs/2408.06621)

  - Author(s): Sungmin Cha, Sungjun Cho, Dasol Hwang, Moontae Lee

  - Date: 2024-08

  - Venue: -

  - Code: -

- [On Effects of Steering Latent Representation for Large Language Model Unlearning](https://arxiv.org/abs/2408.06223)

  - Author(s): Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, Naoya Inoue

  - Date: 2024-08

  - Venue: -

  - Code: -

- [Hotfixing Large Language Models for Code](https://arxiv.org/abs/2408.05727)

  - Author(s): Zhou Yang, David Lo

  - Date: 2024-08

  - Venue: -

  - Code: -

- [UNLEARN Efficient Removal of Knowledge in Large Language Models](https://arxiv.org/abs/2408.04140)

  - Author(s): Tyler Lizzo, Larry Heck

  - Date: 2024-08

  - Venue: -

  - Code: -

- [Tamper-Resistant Safeguards for Open-Weight LLMs](https://arxiv.org/abs/2408.00761)

  - Author(s): Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika

  - Date: 2024-08

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/rishub-tamirisa/tamper-resistance/)

- [On the Limitations and Prospects of Machine Unlearning for Generative AI](https://arxiv.org/abs/2408.00376)

  - Author(s): Shiji Zhou, Lianzhe Wang, Jiangnan Ye, Yongliang Wu, Heng Chang

  - Date: 2024-08

  - Venue: -

  - Code: -

- [Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models](https://arxiv.org/abs/2407.20271)

  - Author(s): Haoyu Tang, Ye Liu, Xukai Liu, Kai Zhang, Yanghai Zhang, Qi Liu, Enhong Chen

  - Date: 2024-07

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/himalalps/ICU)

- [Demystifying Verbatim Memorization in Large Language Models](https://arxiv.org/abs/2407.17817)

  - Author(s): Jing Huang, Diyi Yang, Christopher Potts

  - Date: 2024-07

  - Venue: -

  - Code: -

- [Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective](https://aclanthology.org/2024.emnlp-main.495/)

  - Author(s): Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang

  - Date: 2024-07

  - Venue: - EMNLP 2024

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/UCSB-NLP-Chang/causal_unlearn.git)

- [Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation](https://arxiv.org/abs/2407.16951)

  - Author(s): Huimin Lu, Masaru Isonuma, Junichiro Mori, Ichiro Sakata

  - Date: 2024-07

  - Venue: -

  - Cdoe: -

- [Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs](https://arxiv.org/abs/2407.15549)

  - Author(s): Abhay Sheshadri, Aidan Ewart, Phillip Guo, Aengus Lynch, Cindy Wu, Vivek Hebbar, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper

  - Date: 2024-07

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/aengusl/latent-adversarial-training)

- [What Makes and Breaks Safety Fine-tuning? A Mechanistic Study](https://arxiv.org/abs/2407.10264)

  - Author(s): Samyak Jain, Ekdeep Singh Lubana, Kemal Oksuz, Tom Joy, Philip H.S. Torr, Amartya Sanyal, Puneet K. Dokania

  - Date: 2024-07

  - Venue: -

  - Code: -

- [Practical Unlearning for Large Language Models](https://arxiv.org/abs/2407.10223)

  - Author(s): Chongyang Gao, Lixu Wang, Chenkai Weng, Xiao Wang, Qi Zhu

  - Date: 2024-07

  - Venue: -

  - Code: -

- [Learning to Refuse: Towards Mitigating Privacy Risks in LLMs](https://arxiv.org/abs/2407.10058)

  - Author(s): Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Wenliang Chen

  - Date: 2024-07

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/zhliu0106/learning-to-refuse)

- [Composable Interventions for Language Models](https://arxiv.org/abs/2407.06483)

  - Author(s): Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

  - Date: 2024-07

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/hartvigsen-group/composable-interventions)

- [MUSE: Machine Unlearning Six-Way Evaluation for Language Models](https://openreview.net/forum?id=TArmA033BU)

  - Author(s): Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang

  - Date: 2024-07

  - Venue: - ICML 2024

  - Code: -[![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://muse-bench.github.io/)

- [If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers](https://arxiv.org/abs/2407.06411)

  - Author(s): Adriano Hernandez

  - Date: 2024-07

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/4gatepylon/IfYouDontUnderstandItDontUseIt)

- [Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks](https://arxiv.org/abs/2407.02855)

  - Author(s): Zhexin Zhang, Junxiao Yang, Pei Ke, Shiyao Cui, Chujie Zheng, Hongning Wang, Minlie Huang

  - Date: 2024-07

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/thu-coai/SafeUnlearning)

- [To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models](https://arxiv.org/abs/2407.01920)

  - Author(s): Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang

  - Date: 2024-07

  - Venue: - EMNLP Findings 2024

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/zjunlp/KnowUnDo)

- [Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?](https://arxiv.org/abs/2407.00996)

  - Author(s): Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani

  - Date: 2024-07

  - Venue: -

  - Code: -

- [UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI](https://arxiv.org/abs/2407.00106)

  - Author(s): Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan

  - Date: 2024-07

  - Venue: -

  - Code: -

- [PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs](https://arxiv.org/abs/2406.16810)

  - Author(s): Xinchi Qiu, William F. Shen, Yihong Chen, Nicola Cancedda, Pontus Stenetorp, Nicholas D. Lane

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/bill-shen-BS/PISTOL)

- [Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis](https://arxiv.org/abs/2406.15796)

  - Author(s): Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Xiachong Feng, Bing Qin

  - Date: 2024-06

  - Venue: -

  - Code: -

- [Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models](https://arxiv.org/abs/2406.14091)

  - Author(s): Dohyun Lee, Daniel Rim, Minseok Choi, Jaegul Choo

  - Date: 2024-06

  - Venue: ACL 2024 Findings

  - Code: -

- [Every Language Counts: Learn and Unlearn in Multilingual LLMs](https://arxiv.org/abs/2406.13748)

  - Author(s): Taiming Lu, Philipp Koehn

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/TaiMingLu/learn-unlearn)

- [Mitigating Social Biases in Language Models through Unlearning](https://arxiv.org/abs/2406.13551)

  - Author(s): Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/VectorInstitute/Bias_in_LMsBias_mitigation)

- [Textual Unlearning Gives a False Sense of Unlearning](https://arxiv.org/abs/2406.13348)

  - Author(s): Jiacheng Du, Zhibo Wang, Kui Ren

  - Date: 2024-06

  - Venue: -

  - Code: -

- [Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models](https://arxiv.org/abs/2406.12354)

  - Author(s): Minseok Choi, Kyunghyun Min, Jaegul Choo

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/brightjade/multilingual-unlearning)

- [SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions](https://arxiv.org/abs/2406.12329)

  - Author(s): Minseok Choi, Daniel Rim, Dohyun Lee, Jaegul Choo

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/brightjade/snap-unlearning)

- [Soft Prompting for Unlearning in Large Language Models](https://arxiv.org/abs/2406.12038)

  - Author(s): Karuna Bhaila, Minh-Hao Van, Xintao Wu

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/karuna-bhaila/llm_unlearning)

- [Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs](https://arxiv.org/abs/2406.11780)

  - Author(s): Swanand Ravindra Kadhe, Farhan Ahmed, Dennis Wei, Nathalie Baracaldo, Inkit Padhi

  - Date: 2024-06

  - Venue: -

  - Code: -

- [Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces](https://arxiv.org/abs/2406.11614)

  - Author(s): Yihuai Hong, Lei Yu, Shauli Ravfogel, Haiqin Yang, Mor Geva

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/yihuaihong/ConceptVectors)

- [Avoiding Copyright Infringement via Machine Unlearning](https://arxiv.org/abs/2406.10952)

  - Author(s): Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, Eric Wong

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/guangyaodou/SSU/tree/main)

- [RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models](https://arxiv.org/abs/2406.10890)

  - Author(s): Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, Jun Zhao

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/jinzhuoran/RWKU)

- [REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space](https://arxiv.org/abs/2406.09325)

  - Author(s): Tomer Ashuach, Martin Tutek, Yonatan Belinkov

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/Tomertech/REVS)

- [Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning](https://arxiv.org/abs/2406.09179)

  - Author(s): Qizhou Wang, Bo Han, Puning Yang, Jianing Zhu, Tongliang Liu, Masashi Sugiyama

  - Date: 2024-06

  - Venue: -

  - Code: -

- [Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference](https://arxiv.org/abs/2406.08607)

  - Author(s): Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella, Sijia Liu, Shiyu Chang

  - Date: 2024-06

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/UCSB-NLP-Chang/ULD)

- [Large Language Model Unlearning via Embedding-Corrupted Prompts](https://arxiv.org/abs/2406.07933)

  - Author(s): Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, Yang Liu

  - Date: 2024-06

  - Venue: NeurIPS 2024

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/chrisliu298/llm-unlearn-eco)

- [Federated TrustChain: Blockchain-Enhanced LLM Training and Unlearning](https://arxiv.org/abs/2406.04076)

  - Author(s): Xuhan Zuo, Minghao Wang, Tianqing Zhu, Lefeng Zhang, Dayong Ye, Shui Yu, Wanlei Zhou

  - Date: 2024-06

  - Venue: -

  - Code: -

- [Can Textual Unlearning Solve Cross-Modality Safety Alignment?](https://aclanthology.org/2024.findings-emnlp.574/)

  - Author(s): Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song

  - Date: 2024-06

  - Venue: - EMNLP Findings 2024

  - Code: -

- [RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models](https://arxiv.org/abs/2406.01983)

  - Author(s): Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin

  - Date: 2024-06

  - Venue: -

  - Code: -

- [Toward Robust Unlearning for LLMs](https://openreview.net/forum?id=4rPzaUF6Ej)

  - Author(s): Rishub Tamirisa, Bhrugu Bharathi, Andy Zhou, Bo Li, Mantas Mazeika

  - Date: 2024-05

  - Venue: ICLR 2024 SeT-LLM Workshop

  - Code: -

- [Unlearning Climate Misinformation in Large Language Models](https://arxiv.org/abs/2405.19563)

  - Author(s): Michael Fore, Simranjit Singh, Chaehong Lee, Amritanshu Pandey, Antonios Anastasopoulos, Dimitrios Stamoulis

  - Date: 2024-05

  - Venue: -

  - Code: -

- [Large Scale Knowledge Washing](https://arxiv.org/abs/2405.16720)

  - Author(s): Yu Wang, Ruihan Wu, Zexue He, Xiusi Chen, Julian McAuley

  - Date: 2024-05

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/wangyu-ustc/LargeScaleWashing)

- [Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models](https://arxiv.org/abs/2405.12523)

  - Author(s): Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi

  - Date: 2024-05

  - Venue: -

  - Code: -

- [To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models](https://arxiv.org/abs/2405.03097)

  - Author(s): George-Octavian Barbulescu, Peter Triantafillou

  - Date: 2024-05

  - Venue: ICML 2024

  - Code: -

- [SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning](https://aclanthology.org/2024.emnlp-main.245/)

  - Author(s): Jinghan Jia, Yihua Zhang, Yimeng Zhang, Jiancheng Liu, Bharat Runwal, James Diffenderfer, Bhavya Kailkhura, Sijia Liu

  - Date: 2024-04

  - Venue: - EMNLP 2024

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/OPTML-Group/SOUL)

- [Machine Unlearning in Large Language Models](https://arxiv.org/abs/2404.16841)

  - Author(s): Kongyang Chen, Zixin Wang, Bing Mi, Waixi Liu, Shaowei Wang, Xiaojun Ren, Jiaxing Shen

  - Date: 2024-04

  - Venue: -

  - Code: -

- [Offset Unlearning for Large Language Models](https://arxiv.org/abs/2404.11045)

  - Author(s): James Y. Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, Muhao Chen

  - Date: 2024-04

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/luka-group/Delta-Unlearning)

- [Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge](https://arxiv.org/abs/2404.05880)

  - Author(s): Weikai Lu, Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Zelin Chen, Huiping Zhuang, Cen Chen

  - Date: 2024-04

  - Venue: -

  - Code: -

- [Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning](https://arxiv.org/abs/2404.05868)

  - Author(s): Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

  - Date: 2024-04

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/licong-lin/negative-preference-optimization)

- [Localizing Paragraph Memorization in Language Models](https://arxiv.org/abs/2403.19851)

  - Author(s): Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis

  - Date: 2024-03

  - Venue: -

  - Code: -

- [The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning](https://arxiv.org/abs/2403.03218)

  - Author(s): Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Samuel Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Lin, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Ruoyu Wang, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks

  - Date: 2024-03

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/centerforaisafety/wmdp)

- [Dissecting Language Models: Machine Unlearning via Selective Pruning](https://arxiv.org/abs/2403.01267)

  - Author(s): Nicholas Pochinkov, Nandi Schoots

  - Date: 2024-03

  - Venue: -

  - Code: -

- [Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models](https://arxiv.org/abs/2403.10557)

  - Author(s): Kang Gu, Md Rafi Ur Rashid, Najrin Sultana, Shagufta Mehnaz

  - Date: 2024-03

  - Venue: -

  - Code: -

- [Ethos: Rectifying Language Models in Orthogonal Parameter Space](https://arxiv.org/abs/2403.08994)

  - Author(s): Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, Murali Annavaram

  - Date: 2024-03

  - Venue: -

  - Code: -

- [Towards Efficient and Effective Unlearning of Large Language Models for Recommendation](https://arxiv.org/abs/2403.03536)

  - Author(s): Hangyu Wang, Jianghao Lin, Bo Chen, Yang Yang, Ruiming Tang, Weinan Zhang, Yong Yu

  - Date: 2024-03

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/justarter/E2URec)

- [Guardrail Baselines for Unlearning in LLMs](https://arxiv.org/abs/2403.03329)

  - Author(s): Pratiksha Thaker, Yash Maurya, Virginia Smith

  - Date: 2024-03

  - Venue: ICLR 2024 SeT-LLM Workshop

  - Code: -

- [Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning](https://arxiv.org/abs/2402.11537)

  - Author(s): Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning

  - Date: 2024-02

  - Venue: -

  - Code: -

- [Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination](https://arxiv.org/abs/2402.10052)

  - Author(s): Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vulić

  - Date: 2024-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/dong-river/LLM_unlearning)

- [Towards Safer Large Language Models through Machine Unlearning](https://arxiv.org/abs/2402.10058)

  - Author(s): Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang

  - Date: 2024-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/franciscoliu/SKU)

- [Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models](https://arxiv.org/abs/2402.05813)

  - Author(s): Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong, Georg Gottlob

  - Date: 2024-02

  - Venue: -

  - Code: -

- [Unlearnable Algorithms for In-context Learning](https://arxiv.org/abs/2402.00751)

  - Author(s): Andrei Muresanu, Anvith Thudi, Michael R. Zhang, Nicolas Papernot

  - Date: 2024-02

  - Venue: -

  - Code: -

- [Machine Unlearning of Pre-trained Large Language Models](https://arxiv.org/abs/2402.15159)

  - Author(s): Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue

  - Date: 2024-02

  - Venue: ACL 2024

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/yaojin17/Unlearning_LLM)

- [Visual In-Context Learning for Large Vision-Language Models](https://arxiv.org/abs/2402.11574)

  - Author(s): Yucheng Zhou, Xiang Li, Qianning Wang, Jianbing Shen

  - Date: 2024-02

  - Venue: -

  - Code: -

- [EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models](https://aclanthology.org/2024.emnlp-main.67/)

  - Author(s): Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai

  - Date: 2024-02

  - Venue: - EMNLP 2024

  - Code: - [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/starreeze/efuf)

- [Unlearning Reveals the Influential Training Data of Language Models](https://arxiv.org/abs/2401.15241)

  - Author(s): Masaru Isonuma, Ivan Titov

  - Date: 2024-01

  - Venue: -

  - Code: -

- [TOFU: A Task of Fictitious Unlearning for LLMs](https://arxiv.org/abs/2401.06121)

  - Author(s): Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter

  - Date: 2024-01

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/locuslab/tofu)

### 2023

- [FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs](https://arxiv.org/abs/2312.07420)

  - Author(s): Swanand Ravindra Kadhe, Anisa Halimi, Ambrish Rawat, Nathalie Baracaldo

  - Date: 2023-12

  - Venue: NeurIPS 2023 SoLaR Workshop

  - Code: -

- [Making Harmful Behaviors Unlearnable for Large Language Models](https://arxiv.org/abs/2311.02105)

  - Author(s): Xin Zhou, Yi Lu, Ruotian Ma, Tao Gui, Qi Zhang, Xuanjing Huang

  - Date: 2023-11

  - Venue: -

  - Code: -

- [Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models](https://arxiv.org/abs/2311.08011)

  - Author(s): Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang

  - Date: 2023-11

  - Venue: -

  - Code: -

- [Who's Harry Potter? Approximate Unlearning in LLMs](https://arxiv.org/abs/2310.02238)

  - Author(s): Ronen Eldan, Mark Russinovich

  - Date: 2023-10

  - Venue: -

  - Code: -

- [DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models](https://arxiv.org/abs/2310.20138)

  - Author(s): Xinwei Wu, Junzhuo Li, Minghui Xu, Weilong Dong, Shuangzhi Wu, Chao Bian, Deyi Xiong

  - Date: 2023-10

  - Venue: EMNLP 2023

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/flamewei123/DEPN)

- [Unlearn What You Want to Forget: Efficient Unlearning for LLMs](https://aclanthology.org/2023.emnlp-main.738/)

  - Author(s): Jiaao Chen, Diyi Yang

  - Date: 2023-10

  - Venue: EMNLP 2023

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/SALT-NLP/Efficient_Unlearning/)

- [In-Context Unlearning: Language Models as Few Shot Unlearners](https://arxiv.org/abs/2310.07579)

  - Author(s): Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju

  - Date: 2023-10

  - Venue: - ICML 2024

  - Code: - [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/MartinPawel/In-Context-Unlearning)

- [Large Language Model Unlearning](https://arxiv.org/abs/2310.10683)

  - Author(s): Yuanshun Yao, Xiaojun Xu, Yang Liu

  - Date: 2023-10

  - Venue: NeurIPS 2023 SoLaR Workshop

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/kevinyaobytedance/llm_unlearn)

- [Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble](https://arxiv.org/abs/2309.16082)

  - Author(s): Zhe Liu, Ozlem Kalinli

  - Date: 2023-09

  - Venue: -

  - Code: -

- [Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks](https://arxiv.org/abs/2309.17410)

  - Author(s): Vaidehi Patil, Peter Hase, Mohit Bansal

  - Date: 2023-09

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/Vaidehi99/InfoDeletionAttacks)

- [Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation](https://arxiv.org/abs/2308.08090)

  - Author(s): Xinshuo Hu, Dongfang Li, Baotian Hu, Zihao Zheng, Zhenyu Liu, Min Zhang

  - Date: 2023-08

  - Venue: AAAI 2024

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/HITsz-TMG/Ext-Sub)

- [Unlearning Bias in Language Models by Partitioning Gradients](https://aclanthology.org/2023.findings-acl.375/)

  - Author(s): Charles Yu, Sullam Jeoung, Anish Kasi, Pengfei Yu, Heng Ji

  - Date: 2023-07

  - Venue: ACL (Findings) 2023

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/CharlesYu2000/PCGU-UnlearningBias)

- [Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data](https://arxiv.org/abs/2307.00456)

  - Author(s): Xinzhe Li, Ming Liu, Shang Gao

  - Date: 2023-07

  - Venue: -

  - Code: -

- [What can we learn from Data Leakage and Unlearning for Law?](https://arxiv.org/abs/2307.10476)

  - Author(s): Jaydeep Borkar

  - Date: 2023-07

  - Venue: -

  - Code: -

- [LEACE: Perfect linear concept erasure in closed form](https://arxiv.org/abs/2306.03819)

  - Author(s): Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, Stella Biderman

  - Date: 2023-06

  - Venue: NeurIPS 2023

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/EleutherAI/concept-erasure)

- [Composing Parameter-Efficient Modules with Arithmetic Operations](https://arxiv.org/abs/2306.14870)

  - Author(s): Jinghan Zhang, Shiqi Chen, Junteng Liu, Junxian He

  - Date: 2023-06

  - Venue: NeurIPS 2023

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/hkust-nlp/PEM_composition)

- [KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment](https://arxiv.org/abs/2305.06535)

  - Author(s): Lingzhi Wang, Tong Chen, Wei Yuan, Xingshan Zeng, Kam-Fai Wong, Hongzhi Yin

  - Date: 2023-05

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/Lingzhi-WANG/KGAUnlearn)

### 2022

- [Editing Models with Task Arithmetic](https://arxiv.org/abs/2212.04089)

  - Author(s): Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi

  - Date: 2022-12

  - Venue: ICLR 2023

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/mlfoundations/task_vectors)

- [Privacy Adhering Machine Un-learning in NLP](https://arxiv.org/abs/2212.09573)

  - Author(s): Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth

  - Date: 2022-12

  - Venue: -

  - Code: -

- [The CRINGE Loss: Learning what language not to model](https://arxiv.org/abs/2211.05826)

  - Author(s): Leonard Adolphs, Tianyu Gao, Jing Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

  - Date: 2022-11

  - Venue: -

  - Code: -

- [Knowledge Unlearning for Mitigating Privacy Risks in Language Models](https://arxiv.org/abs/2210.01504)

  - Author(s): Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, Minjoon Seo

  - Date: 2022-10

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/joeljang/knowledge-unlearning)

- [Quark: Controllable Text Generation with Reinforced Unlearning](https://arxiv.org/abs/2205.13636)

  - Author(s): Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi

  - Date: 2022-05

  - Venue: NeurIPS 2022

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/GXimingLu/Quark)

### 2021

- [DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts](https://arxiv.org/abs/2105.03023)

  - Author(s): Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi

  - Date: 2021-05

  - Venue: ACL 2021

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/alisawuffles/DExperts)

## Surveys and Position Papers

### 2025

- [A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models](https://arxiv.org/abs/2503.01854)

  - Author(s): Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray

  - Date: 2025-03

  - Venue: -

- [Open Problems in Machine Unlearning for AI Safety](https://arxiv.org/abs/2501.04952)

  - Author(s): Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, Sören Mindermann, José Hernandez-Orallo, Mor Geva, Yarin Gal

  - Date: 2025-01

  - Venue: -

### 2024

- [Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice](https://arxiv.org/abs/2412.06966)

  - Author(s): A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, Ilia Shumailov, Eleni Triantafillou, Peter Kairouz, Nicole Mitchell, Percy Liang, Daniel E. Ho, Yejin Choi, Sanmi Koyejo, Fernando Delgado, James Grimmelmann, Vitaly Shmatikov, Christopher De Sa, Solon Barocas, Amy Cyphert, Mark Lemley, danah boyd, Jennifer Wortman Vaughan, Miles Brundage, David Bau, Seth Neel, Abigail Z. Jacobs, Andreas Terzis, Hanna Wallach, Nicolas Papernot, Katherine Lee

  - Date: 2024-12

  - Venue: -

- [Position: LLM Unlearning Benchmarks are Weak Measures of Progress](https://arxiv.org/abs/2410.02879)

  - Author(s): Pratiksha Thaker, Shengyuan Hu, Neil Kale, Yash Maurya, Zhiwei Steven Wu, Virginia Smith

  - Date: 2024-10

  - Venue: -

- [Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions](https://arxiv.org/abs/2408.05212)

  - Author(s): Michele Miranda, Elena Sofia Ruzzetti, Andrea Santilli, Fabio Massimo Zanzotto, Sébastien Bratières, Emanuele Rodolà

  - Date: 2024-08

  - Venue: -

- [Machine Unlearning in Generative AI: A Survey](https://arxiv.org/abs/2407.20516)

  - Author(s): Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang

  - Date: 2024-07

  - Venue: -

- [Digital Forgetting in Large Language Models: A Survey of Unlearning Methods](https://arxiv.org/abs/2404.02062)

  - Author(s): Alberto Blanco-Justicia, Najeeb Jebreel, Benet Manzanares, David Sánchez, Josep Domingo-Ferrer, Guillem Collell, Kuan Eeik Tan

  - Date: 2024-04

  - Venue: -

- [Machine Unlearning for Traditional Models and Large Language Models: A Short Survey](https://arxiv.org/abs/2404.01206)

  - Author(s): Yi Xu

  - Date: 2024-04

  - Venue: -

- [The Frontier of Data Erasure: Machine Unlearning for Large Language Models](https://arxiv.org/abs/2403.15779)

  - Author(s): Youyang Qu, Ming Ding, Nan Sun, Kanchana Thilakarathna, Tianqing Zhu, Dusit Niyato

  - Date: 2024-03

  - Venue: -

- [Rethinking Machine Unlearning for Large Language Models](https://arxiv.org/abs/2402.08787)

  - Author(s): Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu

  - Date: 2024-02

  - Venue: -

- [Eight Methods to Evaluate Robust Unlearning in LLMs](https://arxiv.org/abs/2402.16835)

  - Author(s): Aengus Lynch, Phillip Guo, Aidan Ewart, Stephen Casper, Dylan Hadfield-Menell

  - Date: 2024-02

  - Venue: -

### 2023

- [Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges](https://arxiv.org/abs/2311.15766)

  - Author(s): Nianwen Si, Hao Zhang, Heyu Chang, Wenlin Zhang, Dan Qu, Weiqiang Zhang

  - Date: 2023-11

  - Venue: -

- [Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions](https://arxiv.org/abs/2307.03941)

  - Author(s): Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu

  - Date: 2023-07

  - Venue: -

## Frameworks

- [Open Unlearning](https://github.com/locuslab/open-unlearning)

  - Author(s): Vineeth Dorna, Anmol Mekala, Wenlong Zhao, Andrew McCallum, J Zico Kolter, Pratyush Maini

  - Date: 2025-02

  - Venue: -

  - Code: [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/locuslab/open-unlearning)

## Blog Posts

- [Machine Unlearning in 2024](https://ai.stanford.edu/~kzliu/blog/unlearning)

  - Author(s): [Ken Liu](https://ai.stanford.edu/~kzliu/)

  - Date: 2024-05

- [Deep Forgetting & Unlearning for Safely-Scoped LLMs](https://www.alignmentforum.org/posts/mFAvspg4sXkrfZ7FA/deep-forgetting-and-unlearning-for-safely-scoped-llms)

  - Author(s): [Stephen Casper](https://stephencasper.com/)

  - Date: 2023-12
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chrisliu298/awesome-llm-unlearning

Awesome Lists containing this project

README