awesome_LLM-harmful-fine-tuning-papers

A survey on harmful fine-tuning attack for large language model
https://github.com/git-disl/awesome_LLM-harmful-fine-tuning-papers

Last synced: 4 days ago
JSON representation

Content
- Attacks
  - 2025/07/15
  - 2024/08/06 - poisoning)]
  - 2024/10/01 - maybe-features-D17C/)]
  - 2024/10/23
  - 2025/05/1
  - 2025/05/11 - Samples-Matter/)]
  - 2025/05/11
  - 2025/05/11
  - 2025/07/15
  - 2025/08/19
  - 2023/10/22
  - [paper
  - 2025/03/05
  - 2025/05/22 - sri/finetuning-activated-backdoors)]
  - 2024/07/29 - editing/editing-attack)]
  - 2024/10/21
  - 2024/5/28
  - 2025/01/29 - disl/Virus)]
  - 2025/02/03
  - [paper
  - 2025/9/30
  - 2025/10/01
  - 2025/10/08
  - 2025/10/08
  - 2023/10/4
  - 2023/10/5 - Tuning-Safety/LLMs-Finetuning-Safety)]
  - 2023/10/5
  - 2023/10/31
  - 2023/10/31
  - 2023/11/9
  - 2023/12/21
  - 2024/4/1 - nlp/benign-data-breaks-safety)]
  - 2024/6/28
  - 2025/10/08
  - 2025/10/08
- Defenses
  - 2023/8/25
  - 2023/9/14 - tuned-llamas)]
  - 2025/10/08
  - 2025/8/8 - ignorance)]
  - 2024/10/13
  - 2025/05/18
  - 2025/05/22
  - 2025/05/22 - immunization-cond-num)]
  - 2025/06/02 - Group/Unlearn-ILU)]
  - 2025/06/04
  - 2024/10/05
  - 2024/10/05
  - 2025/04/14
  - 2025/05/22
  - 2025/05/23
  - 2025/05/29
  - 2025/07/25
  - 2025/08/04
  - 2025/08/17
  - 2025/08/18 - yi5AF1)
  - 2024/10/13
  - 2025/08/28 - Buncher)]
  - 2025/08/23 - finetuning-api)
  - 2025/06/21
  - 2025/05/07
  - 2025/06/05
  - 2025/09/06
  - 2025/09/08
  - 2025/02/07
  - 2024/8/1 - tamirisa/tamper-resistance)]
  - 2024/10/05
  - 2024/10/05
  - 2024/10/05
  - 2024/9/26
  - 2024/8/1 - tamirisa/tamper-resistance)]
  - 2025/04/12 - Detection)
  - 2025/01/19 - IPA-40B7)]
  - 2024/10/05 - 0065)
  - 2025/02/01 - yibo/Panacea)]
  - 2025/03/06
  - 2024/10/05
  - 2025/06/10 - YuanGroup/AsFT)
  - [paper
  - 2025/03/24
  - 2025/02/24
  - 2025/06/09
  - 2025/06/09
  - 2025/07/01
  - 2025/10/17
  - 2025/10/11
  - 2025/09/08
  - 2025/09/26
  - 2025/11/25
  - 2023/11/02
  - 2025/07/22
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/04/12 - Detection)
  - 2025/11/25
  - 2022/11/27
  - 2024/2/2 - disl/Vaccine)]
  - 2024/5/23 - noising)]
  - 2024/5/24 - Safety-41C2)] [[Openreview](https://openreview.net/forum?id=NrfP7zZNiG)]
  - 2024/9/3 - disl/Booster)] [[Openreview](https://openreview.net/forum?id=tTPHgb0EtV)]
  - 2024/10/05
  - 2024/10/13 - Vaccine)]
  - 2025/06/18 - Group/LoX)]
  - 2025/10/08
  - 2024/2/3 - zong/VLGuard)]
  - 2024/2/7 - attribution-code)]
  - 2024/2/22 - Enhanced-Alignment)]
  - 2024/2/28
  - 2024/5/28 - disl/Lisa)]
  - 2024/6/10 - vs-deep-alignment)] [[Openriew]](https://openreview.net/forum?id=6Mxhg9PtDE)
  - 2024/6/12
  - 2024/8/27
  - 2024/8/30
  - 2024/12/19
  - 2025/02/28
  - 2025/05/22
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/31 - Hu/Bayesian-Data-Scheduler)
  - 2024/2/19 - lab/resta)]
  - 2024/3/8
  - 2024/5/15
  - 2024/5/23
  - 2024/5/27
  - 2024/8/18
  - 2024/10/05
  - 2024/10/05
  - 2024/12/15
  - 2024/12/17
  - 2024/12/30
  - 2025/04/13
  - 2025/05/17
  - 2025/07/01
  - 2025/08/08
  - 2025/09/08
  - 2025/10/08
  - 2025/10/08
  - 2025/10/09 - jiang/MetaDefense)]
- Benchmark
  - 2025/10/08
  - 2025/10/31
  - 2025/10/08
  - 2024/9/19 - noising-xpo)]
  - 2025/5/31 - uw/SafeTuneBed)]
- Attacks and Defenses for Federated Fine-tuning
  - 2024/6/15
  - 2024/11/28
- Other awesome resources on LLM safety
- Interpretability Study
  - 2024/5/25
  - 2025/2/3
  - 2024/11/13
  - 2025/3/24
  - 2025/5/20 - Lab/safety-subspaces)] [[Openreview](https://openreview.net/forum?id=Fj6LakRHcT)]
  - 2024/5/27
  - 2024/10/05
  - 2024/10/05 - Finetuning-Attacks)]
  - 2025/2/3
  - 2025/6/30
  - 2025/8/08 - misalignment)]
  - 2025/10/08
- Mechanical Study
  - 2024/10/05
Star History
- Other awesome resources on LLM safety
  - ![Star History Chart - history.com/#git-disl/awesome_LLM-harmful-fine-tuning-papers&Date)
  - ![Star History Chart - history.com/#git-disl/awesome_LLM-harmful-fine-tuning-papers&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

awesome_LLM-harmful-fine-tuning-papers

Content

Attacks

Defenses

Benchmark

Attacks and Defenses for Federated Fine-tuning

Other awesome resources on LLM safety

Interpretability Study

Mechanical Study

Star History

Other awesome resources on LLM safety