Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-llm-unlearning
A resource repository for machine unlearning in large language models
https://github.com/chrisliu298/awesome-llm-unlearning
Last synced: about 21 hours ago
JSON representation
-
Papers
-
Methods
-
2024
- Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation
- Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- LLM Unlearning via Loss Adjustment with Only Forget Data
- CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept
- Do Unlearning Methods Remove Information from Language Model Weights?
- Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models
- LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
- Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
- Extracting Unlearned Information from LLMs with Activation Steering
- Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
- Unified Parameter-Efficient Unlearning for LLMs
- UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS
- Tamper-Resistant Safeguards for Open-Weight LLMs
- GitHub
- On the Limitations and Prospects of Machine Unlearning for Generative AI
- Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
- GitHub
- Demystifying Verbatim Memorization in Large Language Models
- Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective
- What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
- Unforgettable Generalization in Language Models
- WPN: An Unlearning Method Based on N-pair Contrastive Learning in Language Models
- Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
- A Population-to-individual Tuning Framework for Adapting Pretrained LM to On-device User Intent Prediction
- UNLEARN Efficient Removal of Knowledge in Large Language Models
- Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models
- On Effects of Steering Latent Representation for Large Language Model Unlearning
- Hotfixing Large Language Models for Code
- Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code
- LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
- An Adversarial Perspective on Machine Unlearning for AI Safety
- Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
- Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
- UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
- Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge
- When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?
- Evaluating Deep Unlearning in Large Language Models
- Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation
- Large Language Model Unlearning via Embedding-Corrupted Prompts
- Federated TrustChain: Blockchain-Enhanced LLM Training and Unlearning
- Cross-Modal Safety Alignment: Is textual unlearning all you need?
- RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models
- Toward Robust Unlearning for LLMs
- Unlearning Climate Misinformation in Large Language Models
- Large Scale Knowledge Washing
- Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
- Towards Safer Large Language Models through Machine Unlearning
- Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
- Unlearnable Algorithms for In-context Learning
- Machine Unlearning of Pre-trained Large Language Models
- To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
- SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
- Machine Unlearning in Large Language Models
- Offset Unlearning for Large Language Models
- Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
- Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
- Localizing Paragraph Memorization in Language Models
- The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
- Dissecting Language Models: Machine Unlearning via Selective Pruning
- Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models
- Ethos: Rectifying Language Models in Orthogonal Parameter Space
- Towards Efficient and Effective Unlearning of Large Language Models for Recommendation
- Guardrail Baselines for Unlearning in LLMs
- Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
- Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination
- Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
- Visual In-Context Learning for Large Vision-Language Models
- EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
- Unlearning Reveals the Influential Training Data of Language Models
- TOFU: A Task of Fictitious Unlearning for LLMs
- Practical Unlearning for Large Language Models
- Learning to Refuse: Towards Mitigating Privacy Risks in LLMs
- Composable Interventions for Language Models
- MUSE: Machine Unlearning Six-Way Evaluation for Language Models
- If You Don't Understand It, Don't Use It: Eliminating Trojans with Filters Between Layers
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
- To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
- Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
- GitHub
- Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?
- UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI
- Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models
- Every Language Counts: Learn and Unlearn in Multilingual LLMs
- Mitigating Social Biases in Language Models through Unlearning
- GitHub
- Textual Unlearning Gives a False Sense of Unlearning
- Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models
- GitHub
- SNAP: Unlearning Selective Knowledge in Large Language Models with Negative Instructions
- GitHub
- Soft Prompting for Unlearning in Large Language Models
- Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
- Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
- Avoiding Copyright Infringement via Machine Unlearning
- GitHub
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
- REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning
- MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts
- To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
- PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs
- Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis
- Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference
- Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method
- WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
- Cross-model Control: Improving Multiple Large Language Models in One-time Training
- CLEAR: Character Unlearning in Textual and Visual Modalities
- Applying sparse autoencoders to unlearn knowledge in language models
- Learning and Unlearning of Fabricated Knowledge in Language Models
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate
- Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench
- Mitigating Memorization In Language Models
- A Closer Look at Machine Unlearning for Large Language Models
- Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning
- Dissecting Fine-Tuning Unlearning in Large Language Models
- NegMerge: Consensual Weight Negation for Strong Machine Unlearning
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
- RESTOR: Knowledge Recovery through Machine Unlearning
- Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods
- Provable unlearning in topic modeling and downstream tasks
-
2023
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- GitHub
- Composing Parameter-Efficient Modules with Arithmetic Operations
- KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment
- FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs
- Making Harmful Behaviors Unlearnable for Large Language Models
- Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
- Who's Harry Potter? Approximate Unlearning in LLMs
- DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models
- Unlearn What You Want to Forget: Efficient Unlearning for LLMs
- In-Context Unlearning: Language Models as Few Shot Unlearners
- Large Language Model Unlearning
- Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
- Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
- Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
- Unlearning Bias in Language Models by Partitioning Gradients
- Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
- What can we learn from Data Leakage and Unlearning for Law?
- LEACE: Perfect linear concept erasure in closed form
-
2022
-
2021
-
-
Surveys and Position Papers
-
2024
- Machine Unlearning in Generative AI: A Survey
- Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
- Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
- Machine Unlearning for Traditional Models and Large Language Models: A Short Survey
- The Frontier of Data Erasure: Machine Unlearning for Large Language Models
- Rethinking Machine Unlearning for Large Language Models
- Eight Methods to Evaluate Robust Unlearning in LLMs
- Position: LLM Unlearning Benchmarks are Weak Measures of Progress
-
2023
-
-
Blog Posts
-
Datasets
Programming Languages
Keywords
large-language-models
5
unlearning
5
natural-language-processing
3
benchmark
2
interpretability
1
adversarial-attacks
1
evaluation-framework
1
forgetting
1
membership-inference-attack
1
privacy-protection
1
right-to-be-forgotten
1
nlp
1
pytorch
1
transformers
1
alignment
1
llm-unlearning
1
machine-unlearning
1
llm
1
meta-learning
1
open-weight
1
safeguards
1
tamper-resistance
1
artificial-intelligence
1
dataset
1
knowledge-editing
1
knowledge-unlearning
1
knowundo
1
localization
1
memflex
1
model-editing
1