awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
https://github.com/git-disl/awesome_LLM-harmful-fine-tuning-papers
Last synced: 6 days ago
JSON representation
-
Content
-
Attacks
- 2023/10/22
- [paper
- 2025/03/05
- 2024/7/29 - editing/editing-attack)]
- 2023/10/5
- 2024/10/21
- 2024/10/23
- 2023/10/4
- 2023/10/5 - Tuning-Safety/LLMs-Finetuning-Safety)]
- 2023/10/31
- 2023/11/9
- 2024/4/1 - nlp/benign-data-breaks-safety)]
- 2024/5/28
- 2024/6/28
- 2025/01/29 - disl/Virus)
- 2025/02/03
- [paper
-
Defenses
- 2024/10/13
- 2024/3/8
- 2024/12/19
- 2024/12/17
- 2024/10/13
- 2025/02/7
- 2025/02/28
- 2024/2/2 - disl/Vaccine)]
- 2024/5/23 - noising)]
- 2024/5/24 - Safety-41C2)] [[Openreview](https://openreview.net/forum?id=NrfP7zZNiG)]
- 2024/8/1 - tamirisa/tamper-resistance)]
- 2024/9/3 - disl/Booster)] [[Openreview](https://openreview.net/forum?id=tTPHgb0EtV)]
- 2024/10/13 - Vaccine)]
- 2023/8/25
- 2023/9/14 - tuned-llamas)]
- 2024/2/3 - zong/VLGuard)]
- 2024/2/7 - attribution-code)]
- 2024/2/22 - Enhanced-Alignment)]
- 2024/2/28
- 2024/5/28 - disl/Lisa)]
- 2024/6/10 - vs-deep-alignment)] [[Openriew]](https://openreview.net/forum?id=6Mxhg9PtDE)
- 2024/6/12
- 2024/8/27
- 2024/10/05
- 2024/10/05
- 2024/10/05
- 2024/10/05
- 2024/10/05
- 2024/5/15
- 2024/5/23
- 2024/5/27
- 2024/8/18
- 2024/10/05
- 2024/10/05
- 2024/9/26
- 2024/8/1 - tamirisa/tamper-resistance)]
- 2024/8/30
- 2024/12/15
- 2025/01/19 - IPA-40B7)]
- 2024/10/05 - 0065)
- 2025/02/01 - yibo/Panacea)]
- 2025/03/06
- 2024/2/19 - lab/resta)]
- 2024/10/05
- 2024/12/30
- 2025/04/13
- [paper
- 2025/03/24
- 2025/02/24
-
Attacks and Defenses for Federated Fine-tuning
-
Other awesome resources on LLM safety
-
Mechanical Study
- 2025/2/3
- 2024/5/25
- 2024/5/27
- 2024/10/05
- 2024/10/05
- 2024/11/13
- 2024/10/05 - Finetuning-Attacks)]
- 2025/3/24
-
Benchmark
- 2024/9/19 - noising-xpo)]
-
-
Star History
-
Other awesome resources on LLM safety
- ![Star History Chart - history.com/#git-disl/awesome_LLM-harmful-fine-tuning-papers&Date)
- ![Star History Chart - history.com/#git-disl/awesome_LLM-harmful-fine-tuning-papers&Date)
-
Categories
Sub Categories